It's not that easy to find log files of a custom-cluster test. All
custom-cluster tests use the same log dir and the test output just shows
the symlink of the log files, e.g. "Starting State Store logging to
.../logs/custom_cluster_tests/statestored.INFO".
This patch prints the actual log file names after the cluster launchs.
An example output:
15:17:19 MainThread: Starting State Store logging to /tmp/statestored.INFO
15:17:19 MainThread: Starting Catalog Service logging to /tmp/catalogd.INFO
15:17:19 MainThread: Starting Impala Daemon logging to /tmp/impalad.INFO
15:17:19 MainThread: Starting Impala Daemon logging to /tmp/impalad_node1.INFO
15:17:19 MainThread: Starting Impala Daemon logging to /tmp/impalad_node2.INFO
...
15:17:24 MainThread: Total wait: 2.54s
15:17:24 MainThread: Actual log file names:
15:17:24 MainThread: statestored.INFO -> statestored.quanlong-Precision-3680.quanlong.log.INFO.20251216-151719.1094348
15:17:24 MainThread: catalogd.INFO -> catalogd.quanlong-Precision-3680.quanlong.log.INFO.20251216-151719.1094368
15:17:24 MainThread: impalad.INFO -> impalad.quanlong-Precision-3680.quanlong.log.INFO.20251216-151719.1094466
15:17:24 MainThread: impalad_node1.INFO -> impalad.quanlong-Precision-3680.quanlong.log.INFO.20251216-151719.1094468
15:17:24 MainThread: impalad_node2.INFO -> impalad.quanlong-Precision-3680.quanlong.log.INFO.20251216-151719.1094470
15:17:24 MainThread: Impala Cluster Running with 3 nodes (3 coordinators, 3 executors).
Tests
- Ran the script locally.
- Ran a failed custom-cluster test and verified the actual file names
are printed in the output.
Change-Id: Id76c0a8bdfb221ab24ee315e2e273abca4257398
Reviewed-on: http://gerrit.cloudera.org:8080/23781
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Quanlong Huang <huangquanlong@gmail.com>
When the --use_calcite_planner=true option is set at the server level,
the queries will no longer go through CalciteJniFrontend. Instead, they
will go through the regular JniFrontend, which is the path that is used
when the query option for "use_calcite_planner" is set.
The CalciteJniFrontend will be removed in a later commit.
This commit also enables fallback to the original planner when an unsupported
feature exception is thrown. This needed to be added to allow the tests to run
properly. During initial database load, there are queries that access complex
columns which throws the unsupported exception.
Change-Id: I732516ca8f7ea64f73484efd67071910c9b62c8f
Reviewed-on: http://gerrit.cloudera.org:8080/23523
Reviewed-by: Steve Carlin <scarlin@cloudera.com>
Tested-by: Steve Carlin <scarlin@cloudera.com>
This patch adds an option in bin/start-impala-cluster.py to start the
Impala cluster with Ranger authorization enabled.
Tests
- Manually tested the script and verified Ranger authz is enabled.
Change-Id: I62d6f75fdfcf6e0c3807958e2aae4405054eef8a
Reviewed-on: http://gerrit.cloudera.org:8080/23138
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
IMPALA-13850 changed the behavior of bin/start-impala-cluster.py to wait
for the number of tables to be at least one. This is needed to detect
that the catalog has seen at least one update. There is special logic in
dataload to start Impala without tables in that circumstance.
This broke the perf-AB-test job, which starts Impala before loading
data. There are other times when we want to start Impala without tables,
and it is inconvenient to need to specify --wait_num_tables each time.
It is actually not necessary to wait for catalog metric of Coordinator
to reach certain value. Frontend (Coordinator) will not open its service
port until it heard the first catalog topic update form CatalogD.
IMPALA-13850 (part 2) also ensure that CatalogD with
--catalog_topic_mode=minimal will block serving Coordinator request
until it begin its first reset() operation. Therefore, waiting
Coordinator's catalog version is not needed anymore and
--wait_num_tables parameter can be removed.
This patch also slightly change the "progress log" of
start-impala-cluster.py to print the Coordinator's catalog version
instead of num DB and tables cached. The sleep interval time now include
time spent checking Coordinator's metric.
Testing:
- Pass dataload with updated script.
- Manually run start-impala-cluster.py in both legacy and local catalog
mode and confirm it works.
- Pass custom cluster test_concurrent_ddls.py and test_catalogd_ha.py
Change-Id: I4a3956417ec83de4fb3fc2ef1e72eb3641099f02
Reviewed-on: http://gerrit.cloudera.org:8080/22994
Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>
Tested-by: Riza Suminto <riza.suminto@cloudera.com>
In HA mode, CatalogD initialization can fail to complete within
reasonable time. Log messages showed that CatalogD is blocked trying to
acquire "CatalogServer.catalog_lock_" when calling
CatalogServer::UpdateActiveCatalogd() during statestore subscriber
registration. catalog_lock_ was held by GatherCatalogUpdatesThread which
is calling GetCatalogDelta(), which waits for the java lock versionLock_
which is held by the thread doing CatalogServiceCatalog.reset().
This patch remove catalog reset in JniCatalog constructor. In turn,
catalogd-server.cc is now responsible to trigger the metadata
reset (Invaidate Metadata) only if:
1. It is the active CatalogD, and
2. Gathering thread has collect the first topic update or CatalogD is
set with catalog_topic_mode other than "minimal".
The later prerequisite is to ensure that all coordinators are not
blocked waiting for full topic update in on-demand metadata mode. This
is all managed by a new thread method TriggerResetMetadata that monitor
and trigger the initial reset metadata.
Note that this is a behavior change in on-demand catalog
mode (catalog_topic_mode=minimal). Previously, on-demand catalog mode
will send full database list in its first catalog topic update. This
behavior change is OK since coordinator can request metadata on-demand.
After this patch, catalog-server.active-status and /healthz page can
turn into true and OK respectively even if the very first metadata reset
is still ongoing. Observer that cares about having fully populated
metadata should check other metrics such as catalog.num-db,
catalog.num-tables, or /catalog page content.
Updated start-impala-cluster.py readiness check to wait for at least 1
table to be seen by coordinators, except during create-load-data.sh
execution (there is no table yet) and when use_local_catalog=true (local
catalog cache does not start with any table). Modified startup flag
checking from reading the actual command line args to reading the
'/varz?json' page of the daemon. Cleanup impala_service.py to fix some
flake8 issues.
Slightly update TestLocalCatalogCompactUpdates::test_restart_catalogd so
that unique_database cleanup is successful.
Testing:
- Refactor test_catalogd_ha.py to reduce repeated code, use
unique_database fixture, and additionally validate /healthz page of
both active and standby catalogd. Changed it to test using hs2
protocol by default.
- Run and pass test_catalogd_ha.py and test_concurrent_ddls.py.
- Pass core tests.
Change-Id: I58cc66dcccedb306ff11893f2916ee5ee6a3efc1
Reviewed-on: http://gerrit.cloudera.org:8080/22634
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
Tested-by: Riza Suminto <riza.suminto@cloudera.com>
This patch adds initial support for Iceberg REST Catalogs. This means
now it's possible to run an Impala cluster without the Hive Metastore,
and without the Impala CatalogD. Impala Coordinators can directly
connect to an Iceberg REST server and fetch metadata for databases and
tables from there. The support is read-only, i.e. DDL and DML statements
are not supported yet.
This was initially developed in the context of a company Hackathon
program, i.e. it was a team effort that I squashed into a single commit
and polished the code a bit.
The Hackathon team members were:
* Daniel Becker
* Gabor Kaszab
* Kurt Deschler
* Peter Rozsa
* Zoltan Borok-Nagy
The Iceberg REST Catalog support can be configured via a Java properties
file, the location of it can be specified via:
--catalog_config_dir: Directory of configuration files
Currently only one configuration file can be in the direcory as we only
support a single Catalog at a time. The following properties are mandatory
in the config file:
* connector.name=iceberg
* iceberg.catalog.type=rest
* iceberg.rest-catalog.uri
The first two properties can only be 'iceberg' and 'rest' for now, they
are needed for extensibility in the future.
Moreover, Impala Daemons need to specify the following flags to connect
to an Iceberg REST Catalog:
--use_local_catalog=true
--catalogd_deployed=false
Testing
* e2e added to test basic functionlity with against a custom-built
Iceberg REST server that delegates to HadoopCatalog under the hood
* Further testing, e.g. Ranger tests are expected in subsequent
commits
TODO:
* manual testing against Polaris / Lakekeeper, we could add automated
tests in a later patch
Change-Id: I1722b898b568d2f5689002f2b9bef59320cb088c
Reviewed-on: http://gerrit.cloudera.org:8080/22353
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Downstream system vendors, users and customers have lately expressed
interest in consuming Impala in containerized forms, taking advantage of
various specialized, hardened container base image offerings, like
container offerings based on the Wolfi project by Chainguard;
see: https://github.com/wolfi-dev.
This patch enables Impala container images to be built on top of custom
base images, and adds an implementation example that uses the publicly
available Wolfi base image.
Building a customized Docker image follows a hybrid approach. Instead of
replicating the complete Impala build process inside a Wolfi container
for a fully native binary build, it relies on an existing build platform
that is compatible with the binary packages available inside the custom
container image. For Wolfi the Impala binaries are supplied by the
Red Hat 9 build of Impala. This is made possible by the fact that major
library dependencies of Impala have the same versions on Wolfi OS and
Red Hat 9, so binaries built on Red Hat 9 can be run on Wolfi
with no changes.
The binaries produced by the regular build process are then installed
into a Docker image built on top of an explicitly specified custom base
image. The selection of a custom base image is controlled by two
environment variables:
- USE_CUSTOM_IMPALA_BASE_IMAGE (boolean):
If set to 'true', triggers the use of the custom image.
When set to 'false' or left unspecified, the Docker base image is
selected by the existing logic of matching the build platform's
operating system.
- IMPALA_CUSTOM_DOCKER_BASE (string): specifies the URI of the base image
These environment variables can be overridden from the environment,
from impala-config-branch.sh, or impala-config-local.sh.
They are reported at the end of bin/impala-config.sh where important
environment variables are listed. They are also added to the list of
variables in bin/jenkins/dockerized-impala-preserve-vars.py to ensure
that they can be used in the context of Jenkins jobs as well.
The unified script that installs Impala's required dependencies into the
container image is extended for Wolfi to handle APK packages.
A new script is added to install Bash in the Docker image if it is
missing. Impala build scripts (including the scripts used during Docker
image builds) as well as container startup scripts require Bash,
but minimal container base images usually omit it, favoring a smaller
alternative.
To improve the debugging experience for a containerized Impala
minicluster, the minicluster starter script bin/start-impala-cluster.py
is extended with the following features:
- synchronizes every launched container's timezone to the host.
This is needed for Iceberg time-travel test, which create timestamped
Iceberg metadata items in the impalad context inside a container, but
check creation/modification times of the same items in the test scripts
running on the host, outside the containers. The tests scripts have
the implicit expectation that the same local time is shared across
all these contexts, but this is not necessarily true if the host,
where tests are running is set to a timezone other than UTC.
Time sycnhronization is achieved by injecting the TZ environment
variable into the container, holding the name of the timezone used
on the host. The timezone name is taken either from the host's TZ
variable (if set), or from the host's /etc/localtime symlink,
checking the name of the timezone file it points to.
In case /etc/localtime is not a symlink (and TZ is not set on the
host), the host's /etc/localtime file is mounted into the container.
- sets up a directory for each container to collect the Java VMs error
files (hs_err_pidNNNN.log) from the containers.
- adds the --mount_sources command line parameter, which mounts the
complete $IMPALA_HOME subtree into the container at
/opt/impala/sources to make source code available inside the container
for easier debugging.
Tested by running core-mode tests in the following environments:
- Regular run (impalad running natively on the platform) on Ubuntu 20.04
- Regular run on Rocky Linux 9.2
- Dockerised run (impalad instances running in their individual
containers) using Ubuntu 20.04 containers
- Dockerised run (impalad instances running in their individual
containers) using Rocky Linux 9.2 containers
- Dockerised run (impalad instances running in their individual
containers) using Wolfi's wolfi-base containers
Change-Id: Ia5e39f399664fe66f3774caa316ed5d4df24befc
Reviewed-on: http://gerrit.cloudera.org:8080/22583
Reviewed-by: Laszlo Gaal <laszlo.gaal@cloudera.com>
Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>
Reviewed-by: Jason Fehr <jfehr@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reverse DNS lookup for the Docker container's internal gateway (routing
traffic between code running inside the container and code runnning
natively on the Docker host) happens differently on various operating
systems. Some recent platforms, like RHEL 9 resolve this address to the
default name _gateway. Unfortunately the Java Thrift library within
Impala's frontend considers the underscore character invalid in DNS
names, so it throws an error, preventing the Impala coordinator from
connecting to HMS. This kills Impala on startup, blocking any testing
efforts inside containers.
To avoid this problem this patch adds explicit entries to the container's
/etc/hosts file for the gateway's address as well as the Docker host network.
The name doesn't really matter, as it is used only for Thrift's logging
code, so the mapping uses constant generic name 'gateway'.
The IP address of the gateway is retrieved from the environment variable
INTERNAL_LISTEN_HOST, which is set up by docker/configure_test_network.sh
before the Impala containers are launched.
Tested by a dockerised test run executed on Rocky Linux 9.2, using Rocky
9.2 Docker base images for the Impala containers.
Change-Id: I545607c0bb32f8043a0d3f6045710f28a47bab99
Reviewed-on: http://gerrit.cloudera.org:8080/22438
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Queries that run only against in-memory system tables are currently
subject to the same admission control process as all other queries.
Since these queries do not use any resources on executors, admission
control does not need to consider the state of executors when
deciding to admit these queries.
This change adds a boolean configuration option 'onlyCoordinators'
to the fair-scheduler.xml file for specifying a request pool only
applies to the coordinators. When a query is submitted to a
coordinator only request pool, then no executors are required to be
running. Instead, all fragment instances are executed exclusively on
the coordinators.
A new member was added to the ClusterMembershipMgr::Snapshot struct
to hold the ExecutorGroup of all coordinators. This object is kept up
to date by processing statestore messages and is used when executing
queries that either require the coordinators (such as queries against
sys.impala_query_live) or that use an only coordinators request pool.
Testing was accomplished by:
1. Adding cluster membership manager ctests to assert cluster
membership manager correctly builds the list of non-quiescing
coordinators.
2. RequestPoolService JUnit tests to assert the new optional
<onlyCoords> config in the fair scheduler xml file is correctly
parsed.
3. ExecutorGroup ctests modified to assert the new function.
4. Custom cluster admission controller tests to assert queries with a
coordinator only request pool only run on the active coordinators.
Change-Id: I5e0e64db92bdbf80f8b5bd85d001ffe4c8c9ffda
Reviewed-on: http://gerrit.cloudera.org:8080/22249
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This introduces the IMPALA_USE_PYTHON3_TESTS environment variable
to select whether to run tests using the toolchain Python 3.
This is an experimental option, so it defaults to false,
continuing to run tests with Python 2.
This fixes a first batch of Python 2 vs 3 issues:
- Deciding whether to open a file in bytes mode or text mode
- Adapting to APIs that operate on bytes in Python 3 (e.g. codecs)
- Eliminating 'basestring' and 'unicode' locations in tests/ by using
the recommendations from future
( https://python-future.org/compatible_idioms.html#basestring and
https://python-future.org/compatible_idioms.html#unicode )
- Uses impala-python3 for bin/start-impala-cluster.py
All fixes leave the Python 2 path working normally.
Testing:
- Ran an exhaustive run with Python 2 to verify nothing broke
- Verified that the new environment variable works and that
it uses Python 3 from the toolchain when specified
Change-Id: I177d9b8eae9b99ba536ca5c598b07208c3887f8c
Reviewed-on: http://gerrit.cloudera.org:8080/21474
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>
The change reduces cluster startup time by 1-2 seconds. This also
makes custom cluster tests a bit quicker.
Most of the improvement is caused by removing unneeded sleep from
wait_for_catalog() - it also slept after successful connections,
while when the first coordinator is up, it is likely that all
others are also up, meaning 3*0.5s extra sleep in the dev cluster.
Other changes:
- wait_for_catalog is cleaned up and renamed to
wait_for_coordinator_services
- also wait for hs2_http port to be open
- decreased some sleep intervals
- removed some non-informative logging
- wait for hs2/beeswax/webui ports to be open before trying
to actually connect to them to avoid extra logging from
failed Thrift/http connections
- reordered startup to first wait for coordinators to be up
then wait for num_known_live_backends in each impalad - this
reflects better what the cluster actually waits for (1st catalog
update before starting coordinator services)
Change-Id: Ic4dd8c2bc7056443373ceb256a03ce562fea38a0
Reviewed-on: http://gerrit.cloudera.org:8080/21656
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Reviewed-by: Laszlo Gaal <laszlo.gaal@cloudera.com>
The patch adds a feature to the automated correctness check for
tuple cache. The purpose of this feature is to enable the
verification of the correctness of the tuple cache by comparing
caches with the same key across different queries.
The feature consists of two main components: cache dumping and
runtime correctness validation.
During the cache dumping phase, if a tuple cache is detected,
we retrieve the cache from the global cache and dump it to a
subdirectory as a reference file within the specified debug
dumping directory. The subdirectory is using the cache key as
its name. Additionally, data from the child is also read and
dumped to a separate file in the same directory. We expect
these two files to be identical, assuming the results are
deterministic. For non-deterministic cases like TOP-N or others,
we may detect them and exclude them from dumping later.
Furthermore, the cache data will be transformed into a
human-readable text format on a row-by-row basis before dumping.
This approach allows for easier investigation and later analysis.
The verification process starts by comparing the entire file
content sharing with the same key. If the content matches, the
verification is considered successful. However, if the content
doesn't match, we enter a slower mode where we compare all the
rows individually. In the slow mode, we will create a hash map
from the reference cache file, then iterate the current cache
file row by row and search if every row exists in the hash map.
Additionally, a counter is integrated into the hash map to
handle scenarios involving duplicated rows. Once verification is
complete, if no discrepancies are found, both files will be removed.
If discrepancies are detected, the files will be kept and appended
with a '.bad' postfix.
New start flags:
Added a starting flag tuple_cache_debug_dump_dir for specifying
the directory for dumping the result caches. if
tuple_cache_debug_dump_dir is empty, the feature is disabled.
Added a query option enable_tuple_cache_verification to enable
or disable the tuple cache verification. Default is true. Only
valid when tuple_cache_debug_dump_dir is specified.
Tests:
Ran the testcase test_tuple_cache_tpc_queries and caught known
inconsistencies.
Change-Id: Ied074e274ebf99fb57e3ee41a13148725775b77c
Reviewed-on: http://gerrit.cloudera.org:8080/21754
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
To share catalogd and statestore across Impala clusters, this adds the
cluster id to the membership and request-queue topic names. So impalads
are only visible to each other inside the same cluster, i.e. using the
same cluster id. Note that impalads are still subscribe to the same
catalog-update topic so they can share the same catalog service.
If cluster id is empty, use the original topic names.
This also adds the non-empty cluster id as the prefix of the statestore
subscriber id for impalad and admissiond.
Tests:
- Add custom cluster test
- Ran exhaustive tests
Change-Id: I2ff41539f568ef03c0ee2284762b4116b313d90f
Reviewed-on: http://gerrit.cloudera.org:8080/21573
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This is the first commit to use the Calcite library to parse,
analyze, and optimize queries.
The hook for the planner is through an override of the JniFrontend. The
CalciteJniFrontend class is the driver that walks through each of the
Calcite steps which are as follows:
CalciteQueryParser: Takes the string query and outputs an AST in the
form of Calcite's SqlNode object.
CalciteMetadataHandler: Iterate through the SqlNode from the previous step
and make sure all essential table metadata is retrieved from catalogd.
CalciteValidator: Validate the SqlNode tree, akin to the Impala Analyzer.
CalciteRelNodeConverter: Change the AST into a logical plan. In this first
commit, the only logical nodes used are LogicalTableScan and LogicalProject.
The LogicalTableScan will serve as the node that reads from an Hdfs Table and
the LogicalProject will only project out the used columns in the query. In
later versions, the LogicalProject will also handle function changes.
CalciteOptimizer: This step is to optimize the query. In this cut, it will be
a nop, but in later versions, it will perform logical optimizations via
Calcite's rule mechanism.
CalcitePhysPlanCreator: Converts the Calcite RelNode logical tree into
Impala's PlanNode physical tree
ExecRequestCreator: Implement the existing Impala steps that turn a Single
Node Plan into a Distributed Plan. It will also create the TExecRequest object
needed by the runtime server.
Only some very basic queries will work with this commit. These include:
select * from tbl <-- only needs the LogicalTableScan
select c1 from tbl <-- Also uses the LogicalProject
In the CalciteJniFrontend, there is some basic checks to make sure only
select statements will get processed. Any non-query statement will revert
back to the current Impala planner.
In this iteration, any queries besides the minimal ones listed above will
result in a caught exception which will then be run through the current
Impala planner. The tests that do work can be found in calcite.test and
run through the custom cluster test test_experimental_planner.py
This iteration should support all types with the exception of complex
types. Calcite does not have a STRING type, so the string type is
represented as VARCHAR(MAXINT) similar to how Hive represents their
STRING type.
The ImpalaTypeConverter file is used to convert the Impala Type object
to corresponding Calcite objects.
Authorization is not yet working with this current commit. A Jira has been
filed (IMPALA-13011) to deal with this.
Change-Id: I453fd75b7b705f4d7de1ed73c3e24cafad0b8c98
Reviewed-on: http://gerrit.cloudera.org:8080/21109
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
This implements on-disk caching for the tuple cache. The
TupleCacheNode uses the TupleFileWriter and TupleFileReader
to write and read back tuples from local files. The file format
uses RowBatch's standard serialization used for KRPC data streams.
The TupleCacheMgr is the daemon-level structure that coordinates
the state machine for cache entries, including eviction. When a
writer is adding an entry, it inserts an IN_PROGRESS entry before
starting to write data. This does not count towards cache capacity,
because the total size is not known yet. This IN_PROGRESS entry
prevents other writers from concurrently writing the same entry.
If the write is successful, the entry transitions to the COMPLETE
state and updates the total size of the entry. If the write is
unsuccessful and a new execution might succeed, then the entry is
removed. If the write is unsuccessful and won't succeed later
(e.g. if the total size of the entry exceeds the max size of an
entry), then it transitions to the TOMBSTONE state. TOMBSTONE
entries avoid the overhead of trying to write entries that are
too large.
Given these states, when a TupleCacheNode is doing its initial
Lookup() call, one of three things can happen:
1. It can find a COMPLETE entry and read it.
2. It can find an IN_PROGRESS/TOMBSTONE entry, which means it
cannot read or write the entry.
3. It finds no entry and inserts its own IN_PROGRESS entry
to start a write.
The tuple cache is configured using the tuple_cache parameter,
which is a combination of the cache directory and the capacity
similar to the data_cache parameter. For example, /data/0:100GB
uses directory /data/0 for the cache with a total capacity of
100GB. This currently supports a single directory, but it can
be expanded to multiple directories later if needed. The cache
eviction policy can be specified via the tuple_cache_eviction_policy
parameter, which currently supports LRU or LIRS. The tuple_cache
parameter cannot be specified if allow_tuple_caching=false.
This contains contributions from Michael Smith, Yida Wu,
and Joe McDonnell.
Testing:
- This adds basic custom cluster tests for the tuple cache.
Change-Id: I13a65c4c0559cad3559d5f714a074dd06e9cc9bf
Reviewed-on: http://gerrit.cloudera.org:8080/21171
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Kurt Deschler <kdeschle@cloudera.com>
This patch adds a plan node framework for caching of intermediate result
tuples within a query. Actual caching of data will be implemented in
subsequent patches.
A new plan node type TupleCacheNode is introduced for brokering caching
decisions at runtime. If the result is in the cache, the TupleCacheNode will
return results from the cache and skip executing its child node. If the
result is not cached, the TupleCacheNode will execute its child node and
mirror the resulting RowBatches to the cache.
The TupleCachePlanner decides where to place the TupleCacheNodes. To
calculate eligibility and cache keys, the plan must be in a stable state
that will not change shape. TupleCachePlanner currently runs at the end
of planning after the DistributedPlanner and ParallelPlanner have run.
As a first cut, TupleCachePlanner places TupleCacheNodes at every
eligible location. Eligibility is currently restricted to immediately
above HdfsScanNodes. This implementation will need to incorporate cost
heuristics and other policies for placement.
Each TupleCacheNode has a hash key that is generated from the logical
plan below for the purpose of identifying results that have been cached
by semantically equivalent query subtrees. The initial implementation of
the subtree hash uses the plan Thrift to uniquely identify the subtree.
Tuple caching is enabled by setting the enable_tuple_cache query option
to true. As a safeguard during development, enable_tuple_cache can only
be set to true if the "allow_tuple_caching" startup option is set to
true. It defaults to false to minimize the impact for production clusters.
bin/start-impala-cluster.py sets allow_tuple_caching=true by default
to enable it in the development environment.
Testing:
- This adds a frontend test that does basic checks for cache keys and
eligibility
- This verifies the presence of the caching information in the explain
plan output.
Change-Id: Ia1f36a87dcce6efd5d1e1f0bc04009bf009b1961
Reviewed-on: http://gerrit.cloudera.org:8080/21035
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Reviewed-by: Yida Wu <wydbaggio000@gmail.com>
Reviewed-by: Kurt Deschler <kdeschle@cloudera.com>
The DiskIoMgr starts a large number of threads for each different
type of object store, most of which are idle. For development,
this slows down processing minidumps and debugging with gdb.
This adds an option "reduce_disk_io_threads" to bin/start-impala-cluster.py
that sets the thread count startup parameter to one for any filesystem
that is not the TARGET_FILESYSTEM. On a typical development setup
running against HDFS, this reduces the number of DiskIoMgr threads
by 150 and the HDFS monitoring threads by 150 as well. This option is
enabled by default. It can disabled by setting --reduce_disk_io_threads=False
for bin/start-impala-cluster.py.
Separately, DiskIoMgr should be modified to reduce the number of
threads it spawns in general.
Testing:
- Hand tested this on my local development system
Change-Id: Ic8ee1fb1f9b9fe65d542d024573562b3bb120b76
Reviewed-on: http://gerrit.cloudera.org:8080/20920
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
IMPALA-11184 add code to target specific PID for log rotation. This
align with glog behavior and grant safety. That is, it is strictly limit
log rotation to only consider log files made by the currently running
Impalad and exclude logs made by previous PID or other living-colocated
Impalads. The downside of this limit is that logs can start accumulate
in a node when impalad is frequently restarted and is only resolvable by
admin doing manual log removal.
To help avoid this manual removal, this patch adds a backend flag
'log_rotation_match_pid' that relax the limit by dropping the PID in
glob pattern. Default value for this new flag is False. However, for
testing purpose, start-impala-cluster.py will override it to True since
test minicluster logs to a common log directory. Setting
'log_rotation_match_pid' to True will prevent one impalad from
interfering with log rotation of other impalad in minicluster.
As a minimum exercise for this new log rotation behavior,
test_breakpad.py::TestLogging is modified to invoke
start-impala-cluster.py with 'log_rotation_match_pid' set to False.
Testing:
- Add test_excessive_cerr_ignore_pid and test_excessive_cerr_match_pid.
- Split TestLogging into two. One run test_excessive_cerr_ignore_pid in
core exploration, while the other run the rest of logging tests in
exhaustive exploration.
- Pass exhaustive tests.
Change-Id: I599799e73f27f941a1d7f3dec0f40b4f05ea5ceb
Reviewed-on: http://gerrit.cloudera.org:8080/20754
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
To support statestore HA, we allow two statestored instances in an
Active-Passive HA pair to be added to an Impala cluster. We add the
preemptive behavior for statestored. When HA is enabled, the preemptive
behavior allows the statestored with the higher priority to become
active and the paired statestored becomes standby. The active
statestored acts as the owner of Impala cluster and provides statestore
service for the cluster members.
To enable catalog HA for a cluster, two statestoreds in the HA pair and
all subscribers must be started with starting flag
"enable_statestored_ha" as true.
This patch makes following changes:
- Defined new service for Statestore HA.
- Statestored negotiates the role for HA with its peer statestore
instance on startup.
- Create HA monitor thread:
Active statestored sends heartbeat to standby statestored.
Standby statestored monitors peer's connection states with their
subscribers.
- Standby statestored sends heartbeat to subscribers with request
for connection state between active statestore and subscribers.
Standby statestored saves the connection state as failure detecter.
- When standby statestored lost connection with active statestore,
it checks the connection states for active statestore, and takes over
active role if majority of subscribers lost connections with active
statestore.
- New active statestored sends RPC notification to all subscribers
for new active statestored and active catalogd elected by the new
active statestored.
- New active statestored starts sending heartbeat to its peer when it
receives handshake from its peer.
- Active statestored enters recovery mode if it lost connections with
its peer statestored and all subscribers. It keeps sending HA
handshake to its peer until receiving response.
- All subscribers (impalad/catalogd/admissiond) register to two
statestoreds.
- Subscribers report connection state for the requests from standby
statestore.
- Subscribers switch to new active statestore when receiving RPC
notifications from new active statestored.
- Only active statestored sends topic update messages to subscribers.
- Add a new option "enable_statestored_ha" in script
bin/start-impala-cluster.py for starting Impala mini-cluster with
statestored HA enabled.
- Add a new Thrift API in statestore service to disable network
for statestored. It's only used for unit-test to simulate network
failure. For safety, it's only working when the debug action is set
in starting flags.
Testings:
- Added end-to-end unit tests for statestored HA.
- Passed core tests
Change-Id: Ibd2c814bbad5c04c1d50c2edaa5b910c82a6fd87
Reviewed-on: http://gerrit.cloudera.org:8080/20372
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This patch allows a third party to inject their own frontend
class instead of using the default JniFrontend included in the
project.
The test case includes an interface that runs queries as normal
except for the "select 1" query which gets changed to "select 42".
Change-Id: I89e677da557b39232847644b6ff17510e2b3c3d5
Reviewed-on: http://gerrit.cloudera.org:8080/20459
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Enables building for Java 17 - and particularly using Java 17 in
containers - but won't run a minicluster fully with Java 17 as some
projects (Hadoop) don't yet support it.
Starting with Java 15, ehcache.sizeof encounters
UnsupportedOperationException: can't get field offset on a hidden class
in class members pointing to capturing lambda functions. Java 17 also
introduces new modules that need to be added to add-opens. Both of these
pose problems for continued use of ehcache.
Adds https://github.com/jbellis/jamm as a new cache weigher for Java
15+. We build from HEAD as an external project until Java 17 support is
released (https://github.com/jbellis/jamm/issues/44). Adds the
'java_weigher' option to select 'sizeof' or 'jamm'; defaults to 'auto',
which uses jamm for Java 15+ and sizeof for everything else. Also adds
metrics for viewing cache weight results.
Adds JAVA_HOME/lib/server to LD_LIBRARY_PATH in run-jvm-binary to
simplify switching between JDK versions for testing. You can now
- export IMPALA_JDK_VERSION=11
- source bin/impala-config.sh
- start-impala-cluster.py
and have Impala running a different JDK (11) version.
Retains add-opens calls that are still necessary due to dependencies'
use of lambdas for jamm, and all others for ehcache. Add-opens are still
required as a fallback, as noted in
https://github.com/jbellis/jamm#object-graph-crawling. We catch the
exceptions jamm and ehcache throw - CannotAccessFieldException,
UnsupportedOperationException - to avoid crashing Impala, and add it to
the list of banned log messages (as we should add-opens when we find
them).
Testing:
- container test run with Java 11 and 17 (excludes custom cluster)
- manual custom_cluster/test_local_catalog.py +
test_banned_log_messages.py run with Java 11 and 17 (Java 8 build)
- full Java 11 build (passed except IMPALA-12184)
- add test catalog cache entry size metrics fit reasonable bounds
- add unit test for utility to find jamm jar file in classpath
Change-Id: Ic378896f572e030a3a019646a96a32a07866a737
Reviewed-on: http://gerrit.cloudera.org:8080/19863
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
RedHat 9 and Ubuntu 22 switch to cgroups v2, which has a different
hierarchy than cgroups v1. Ubuntu 20 has a hybrid layout with both
cgroup and cgroup2 mounted, but the cgroup2 functionality is limited.
Updates cgroup-util to
- identify available cgroups in FindCGroupMounts. Prefers v1 if
available, as Ubuntu 20's hybrid layout provides only limited v2
interfaces.
- refactors file reading to follow guidelines from
https://gehrcke.de/2011/06/reading-files-in-c-using-ifstream-dealing-correctly-with-badbit-failbit-eofbit-and-perror/
for clearer error handling. Specifically, failbit doesn't set errno, but
we were printing it anyway (which produced misleading errors).
- updates FindCGroupMemLimit to read memory.max for cgroups v2.
- updates DebugString to print the correct property based on cgroup
version.
Removes unused cgroups test library.
Testing:
- proc-info-test CGroupInfo.ErrorHandling test on RHEL 9 and Ubuntu 20.
- verified no error messages related to reading cgroup present in logs
on RHEL 9 and Ubuntu 20.
Change-Id: I8dc499bd1b490970d30ed6dcd2d16d14ab41ee8c
Reviewed-on: http://gerrit.cloudera.org:8080/20105
Reviewed-by: Yida Wu <wydbaggio000@gmail.com>
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
To support catalog HA, we allow two catalogd instances in an Active-
Passive HA pair to be added to an Impala cluster.
We add the preemptive behavior for catalogd. When enabled, the
preemptive behavior allows the catalogd with the higher priority to
become active and the paired catalogd becomes standby. The active
catalogd acts as the source of metadata and provides catalog service
for the Impala cluster.
To enable catalog HA for a cluster, two catalogds in the HA pair and
statestore must be started with starting flag "enable_catalogd_ha".
The catalogd in an Active-Passive HA pair can be assigned an instance
priority value to indicate a preference for which catalogd should assume
the active role. The registration ID which is assigned by statestore can
be used as instance priority value. The lower numerical value in
registration ID corresponds to a higher priority. The catalogd with the
higher priority is designated as active, the other catalogd is
designated as standby. Only the active catalogd propagates the
IMPALA_CATALOG_TOPIC to the cluster. This guarantees only one writer for
the IMPALA_CATALOG_TOPIC in a Impala cluster.
The statestore which is the registration center of an Impala cluster
assigns the roles for the catalogd in the HA pair after both catalogds
register to statestore. When statestore detects the active catalogd is
not healthy, it fails over catalog service to standby catalogd. When
failover occurs, statestore sends notifications with the address of
active catalogd to all coordinators and catalogd in the cluster. The
events are logged in the statestore and catalogd logs. When the catalogd
with the higher priority recovers from a failure, statestore does not
resume it as active to avoid flip-flop between the two catalogd.
To make a specific catalogd in the HA pair as active instance, the
catalogd must be started with starting flag "force_catalogd_active" so
that the catalogd will be assigned with active role when it registers
to statestore. This allows administrator to manually perform catalog
service failover.
Added option "--enable_catalogd_ha" in bin/start-impala-cluster.py.
If the option is specified when running the script, the script will
create an Impala cluster with two catalogd instances in HA pair.
Testing:
- Passed the core tests.
- Added unit-test for auto failover and manual failover.
Change-Id: I68ce7e57014e2a01133aede7853a212d90688ddd
Reviewed-on: http://gerrit.cloudera.org:8080/19914
Reviewed-by: Xiang Yang <yx91490@126.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Tamas Mate <tmater@apache.org>
During Impala startup, Before starting the JVM (by calling libhdfs),
adds add-opens calls to JAVA_TOOL_OPTIONS to ensure Ehcache has access
to non-public members so it can accurately calculate object size.
This effectively circumvents new security precautions in Java 9+.
Use '--jvm_automatic_add_opens=false' to disable it.
Tested with Java 11
JDBC_TEST=false EE_TEST=false FE_TEST=false BE_TEST=false \
CLUSTER_TEST_FILES=custom_cluster/test_local_catalog.py \
run-all-tests.sh
Change-Id: I47a6533b2aa94593d9348e8e3606633f06a111e8
Reviewed-on: http://gerrit.cloudera.org:8080/19845
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Reviewed-by: Quanlong Huang <huangquanlong@gmail.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Adds new environment variable IMPALA_JDK_VERSION which can be 'system',
'8', or '11'. The default is 'system', which uses the same logic as
before. If set to 8 or 11, it will ignore the system java and search for
java of that specific version (based on specific directories for Ubuntu
and Redhat). This is used by bin/bootstrap_system.sh to determine
whether to install java 8 or java 11 (other versions can come later). If
IMPALA_JDK_VERSION=11, then bin/start-impala-cluster.py adds the opens
needed to deal with the ehcache issue.
This no longer puts JAVA_HOME in bin/impala-config-local.sh as part of
bootstrap_system.sh. Instead, it provides a new environment variable
IMPALA_JAVA_HOME_OVERRIDE, which will be preferred over
IMPALA_JDK_VERSION.
This also updates the versions of Maven plugins related to the build.
Source and target releases are still set to Java 8 compatibility.
Adds a verifier to the end of run-all-tests that
InaccessibleObjectException is not present in impalad logs. Tested with
JDBC_TEST=false EE_TEST=false FE_TEST=false BE_TEST=false \
CLUSTER_TEST_FILES=custom_cluster/test_local_catalog.py \
run-all-tests.sh
Testing: ran test suite with Java 11
Change-Id: I15d309e2092c12d7fdd2c99b727f3a8eed8bc07a
Reviewed-on: http://gerrit.cloudera.org:8080/19539
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Reviewed-by: Quanlong Huang <huangquanlong@gmail.com>
Tested-by: Michael Smith <michael.smith@cloudera.com>
This changes the docker image build code so that both Java 8 and Java 11
images can be built in the same build. Specifically, it introduces new
Make targets for Java 11 docker images in addition to the regular Java 8
targets. The "docker_images" and "docker_debug_images" targets continue
to behave the same way and produce Java 8 images of the same name. The
"docker_java11_images" and "docker_debug_java11_images" produce the
daemon docker images for Java 11.
Preserves IMPALA_DOCKER_USE_JAVA11 for selecting Java 11 images when
starting a cluster with container images.
Change-Id: Ic2b124267c607242bc2fd6c8cd6486293a938f50
Reviewed-on: http://gerrit.cloudera.org:8080/19722
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
We're starting to see environments where the system Python ('python') is
Python 3. Updates utility and build scripts to work with Python 3, and
updates check-pylint-py3k.sh to check scripts that use system python.
Fixes other issues found during a full build and test run with Python
3.8 as the default for 'python'.
Fixes a impala-shell tip that was supposed to have been two tips (and
had no space after period when they were printed).
Removes out-of-date deploy.py and various Python 2.6 workarounds.
Testing:
- Full build with /usr/bin/python pointed to python3
- run-all-tests passed with python pointed to python3
- ran push_to_asf.py
Change-Id: Idff388aff33817b0629347f5843ec34c78f0d0cb
Reviewed-on: http://gerrit.cloudera.org:8080/19697
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Michael Smith <michael.smith@cloudera.com>
This patch implements asynchronous writes to the data cache to improve
scan performance when a cache miss happens.
Previously, writes to the data cache are synchronous with hdfs file
reads, and both are handled by remote hdfs IO threads. In other words,
if a cache miss occurs, the IO thread needs to take additional
responsibility for cache writes, which will lead to scan performance
deterioration.
This patch uses a thread pool for asynchronous writes, and the number of
threads in the pool is determined by the new configuration
'data_cache_num_write_threads'. In asynchronous write mode, the IO
thread only needs to copy data to the temporary buffer when storing data
into the data cache. The additional memory consumption caused by
temporary buffers can be limited, depending on the new configuration
'data_cache_write_buffer_limit'.
Testing:
- Add test cases for asynchronous data writing to the original
DataCacheTest using different number of threads.
- Add DataCacheTest,#OutOfWriteBufferLimit
Used to test the limit of memory consumed by temporary buffers in the
case of asynchronous writes
- Add a timer to the MultiThreadedReadWrite function to get the average
time of multithreaded writes. Here are some test cases and their time
that differ significantly between synchronous and asynchronous:
Test case | Policy | Sync/Async | write time in ms
MultiThreadedNoMisses | LRU | Sync | 12.20
MultiThreadedNoMisses | LRU | Async | 20.74
MultiThreadedNoMisses | LIRS | Sync | 9.42
MultiThreadedNoMisses | LIRS | Async | 16.75
MultiThreadedWithMisses | LRU | Sync | 510.87
MultiThreadedWithMisses | LRU | Async | 10.06
MultiThreadedWithMisses | LIRS | Sync | 1872.11
MultiThreadedWithMisses | LIRS | Async | 11.02
MultiPartitions | LRU | Sync | 1.20
MultiPartitions | LRU | Async | 5.23
MultiPartitions | LIRS | Sync | 1.26
MultiPartitions | LIRS | Async | 7.91
AccessTraceAnonymization | LRU | Sync | 1963.89
AccessTraceAnonymization | LRU | Sync | 2073.62
AccessTraceAnonymization | LRU | Async | 9.43
AccessTraceAnonymization | LRU | Async | 13.13
AccessTraceAnonymization | LIRS | Sync | 1663.93
AccessTraceAnonymization | LIRS | Sync | 1501.86
AccessTraceAnonymization | LIRS | Async | 12.83
AccessTraceAnonymization | LIRS | Async | 12.74
Change-Id: I878f7486d485b6288de1a9145f49576b7155d312
Reviewed-on: http://gerrit.cloudera.org:8080/19475
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Python 3 made the main dictionary methods lazy (items(),
keys(), values()). This means that code that uses those
methods may need to wrap the call in list() to get a
list immediately. Python 3 also removed the old iter*
lazy variants.
This changes all locations to use Python 3 dictionary
methods and wraps calls with list() appropriately.
This also changes all itemitems(), itervalues(), iterkeys()
locations to items(), values(), keys(), etc. Python 2
will not use the lazy implementation of these, so there
is a theoretical performance impact. Our python code is
mostly for tests and the performance impact is minimal.
Python 2 will be deprecated when Python 3 is functional.
This addresses these pylint warnings:
dict-iter-method
dict-keys-not-iterating
dict-values-not-iterating
Testing:
- Ran core tests
Change-Id: Ie873ece54a633a8a95ed4600b1df4be7542348da
Reviewed-on: http://gerrit.cloudera.org:8080/19590
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>
Python 3 changes list operators such as range, map, and filter
to be lazy. Some code that expects the list operators to happen
immediately will fail. e.g.
Python 2:
range(0,5) == [0,1,2,3,4]
True
Python 3:
range(0,5) == [0,1,2,3,4]
False
The fix is to wrap locations with list(). i.e.
Python 3:
list(range(0,5)) == [0,1,2,3,4]
True
Since the base operators are now lazy, Python 3 also removes the
old lazy versions (e.g. xrange, ifilter, izip, etc). This uses
future's builtins package to convert the code to the Python 3
behavior (i.e. xrange -> future's builtins.range).
Most of the changes were done via these futurize fixes:
- libfuturize.fixes.fix_xrange_with_import
- lib2to3.fixes.fix_map
- lib2to3.fixes.fix_filter
This eliminates the pylint warnings:
- xrange-builtin
- range-builtin-not-iterating
- map-builtin-not-iterating
- zip-builtin-not-iterating
- filter-builtin-not-iterating
- reduce-builtin
- deprecated-itertools-function
Testing:
- Ran core job
Change-Id: Ic7c082711f8eff451a1b5c085e97461c327edb5f
Reviewed-on: http://gerrit.cloudera.org:8080/19589
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>
This takes steps to make Python 2 behave like Python 3 as
a way to flush out issues with running on Python 3. Specifically,
it handles two main differences:
1. Python 3 requires absolute imports within packages. This
can be emulated via "from __future__ import absolute_import"
2. Python 3 changed division to "true" division that doesn't
round to an integer. This can be emulated via
"from __future__ import division"
This changes all Python files to add imports for absolute_import
and division. For completeness, this also includes print_function in the
import.
I scrutinized each old-division location and converted some locations
to use the integer division '//' operator if it needed an integer
result (e.g. for indices, counts of records, etc). Some code was also using
relative imports and needed to be adjusted to handle absolute_import.
This fixes all Pylint warnings about no-absolute-import and old-division,
and these warnings are now banned.
Testing:
- Ran core tests
Change-Id: Idb0fcbd11f3e8791f5951c4944be44fb580e576b
Reviewed-on: http://gerrit.cloudera.org:8080/19588
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>
Python 3 does not support this old except syntax:
except Exception, e:
Instead, it needs to be:
except Exception as e:
This uses impala-futurize to fix all locations of
the old syntax.
Testing:
- The check-python-syntax.sh no longer shows errors
for except syntax.
Change-Id: I1737281a61fa159c8d91b7d4eea593177c0bd6c9
Reviewed-on: http://gerrit.cloudera.org:8080/19551
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Michael Smith <michael.smith@cloudera.com>
This change adds geospatial functions from Hive's ESRI library
as builtin UDFs. Plain Hive UDFs are imported without changes,
but the generic and varargs functions are handled differently;
generic functions are added with all of the combinations of
their parameters (cartesian product of the parameters), and
varargs functions are unfolded as an nth parameter simple
function. The varargs function wrappers are generated at build
time and they can be configured in
gen_geospatial_udf_wrappers.py. These additional steps are
required because of the limitations in Impala's UDF Executor
(lack of varargs support and only partial generics support)
which could be further improved; in this case, the additional
wrapping/mapping steps could be removed.
Changes regarding function handling/creating are sourced from
https://gerrit.cloudera.org/c/19177
A new backend flag was added to turn this feature on/off
as "geospatial_library". The default value is "NONE" which
means no geospatial function gets registered
as builtin, "HIVE_ESRI" value enables this implementation.
The ESRI geospatial implementation for Hive currently only
available in Hive 4, but CDP Hive backported it to Hive 3,
therefore for Apache Hive this feature is disabled
regardless of the "geospatial_library" flag.
Known limitations:
- ST_MultiLineString, ST_MultiPolygon only works
with the WKT overload
- ST_Polygon supports a maximum of 6 pairs of coordinates
- ST_MultiPoint, ST_LineString supports a maximum of 7
pairs of coordinates
- ST_ConvexHull, ST_Union supports a maximum of 6 geoms
These limits can be increased in gen_geospatial_udf_wrappers.py
Tests:
- test_geospatial_udfs.py added based on
https://github.com/Esri/spatial-framework-for-hadoop
Co-Authored-by: Csaba Ringhofer <csringhofer@cloudera.com>
Change-Id: If0ca02a70b4ba244778c9db6d14df4423072b225
Reviewed-on: http://gerrit.cloudera.org:8080/19425
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
- If external_fe_port flag is >0, spins up a new HS2 compatible
service port
- Added enable_external_fe_support option to start-impala-cluster.py
- which when detected will start impala clusters with
external_fe_port on 21150-21152
- Modify impalad_coordinator Dockerfile to expose external frontend
port at 21150
- The intent of this commit is to separate external frontend
connections from normal hs2 connections
- This allows different security policy to be applied to
each type of connection. The external_fe_port should be considered
a privileged service and should only be exposed to an external
frontend that does user authentication and does authorization
checks on generated plans
Change-Id: I991b5b05e12e37d8739e18ed1086bbb0228acc40
Reviewed-by: Aman Sinha <amsinha@cloudera.com>
Reviewed-on: http://gerrit.cloudera.org:8080/17125
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
A recent patch (IMPALA-9930) introduces a new admission control rpc
service, which can be configured to perform admission control for
coordinators. In that patch, the admission service runs in an impalad.
This patch separates the service out to run in a new daemon, called
the admissiond. It also integrates this new daemon with the build
infrastructure around Docker.
Some notable changes:
- Adds a new class, AdmissiondEnv, which performs the same function
for the admissiond as ExecEnv does for impalads.
- The '/admission' http endpoint is exposed on the admissiond's webui
if the admission control service is in use, otherwise it is exposed
on coordinator impalad's webuis.
- start-impala-cluster.py takes a new flag --enable_admission_service
which configures the minicluster to have an admissiond with all
coordinators using it for admission control.
- Coordinators are now configured to use the admission service by
specifying the startup flag --admission_service_host. This is
intended to mirror the configuration of the statestored/catalogd
location.
Testing:
- Existing tests for the admission control serivce are modified to run
with an admissiond.
- Manually ran start-impala-cluster.py with --enable_admission_service
and --docker_network to verify Docker integration.
Change-Id: Id677814b31e9193035e8cf0d08aba0ce388a0ad9
Reviewed-on: http://gerrit.cloudera.org:8080/16891
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
The legacy Thrift based Impala internal service has been removed so
the backend port 22000 can be freed up.
This patch set flag be_port as a REMOVED_FLAG and all infrastructures
around it are cleaned up. StatestoreSubscriber::subscriber_id is set
as hostname + krpc_port.
Testing:
- Passed the exhaustive test.
Change-Id: Ic6909a8da449b4d25ee98037b3eb459af4850dc6
Reviewed-on: http://gerrit.cloudera.org:8080/16533
Reviewed-by: Thomas Tauber-Marshall <tmarshall@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Recent testing showed that the pytests are not
respecting the log level and format set in
conftest.py's configure_logging(). It is using
the default log level of WARNING and the
default formatter.
The issue is that logging.basicConfig() is only
effective the first time it is called. The code
in lib/python/impala_py_lib/helpers.py does a
call to logging.basicConfig() at the global
level, and conftest.py imports that file. This
renders the call in configure_logging()
ineffective.
To avoid this type of confusion, logging.basicConfig()
should only be called from the main() functions for
libraries. This removes the call in lib/python/impala_py_lib
(as it is not needed for a library without a main function).
It also fixes up various other locations to move the
logging.basicConfig() call to the main() function.
Testing:
- Ran the end to end tests and custom cluster tests
- Confirmed the logging format
- Added an assert in configure_logging() to test that
the INFO log level is applied to the root logger.
Change-Id: I5d91b7f910b3606c50bcba4579179a0bc8c20588
Reviewed-on: http://gerrit.cloudera.org:8080/16679
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This patch introduces a new approach of limiting the memory usage
for both mini-cluster and CDH cluster.
Without this limit, clusters are prone to getting killed when running
in docker containers with a lower mem limit than host's memory size.
i.e. The mini-cluster may running in a
container with 32GB limitted by CGROUPS, while the host machine has
128GB. Under this circumstance, if the container is started with
'-privileged' command argument, both mini and CDH clusters compute
their mem_limit according to 128GB rather than 32GB. They will be
killed when attempting to apply for extra resource.
Currently, the mem-limit estimating algorithms for Impalad and Node
Manager are different:
for Impalad: mem_limit = 0.7 * sys_mem / cluster_size (default is 3)
for Node Manager:
1. Leave aside 24GB, then fit the left into threasholds below.
2. The bare limit is 4GB and maximum limit 48GB
In headge of over-consumption, we
- Added a new environment variable IMPALA_CLUSTER_MAX_MEM_GB
- Modified the algorithm in 'bin/start-impala-cluster.py', making it
taking IMPALA_CLUSTER_MAX_MEM_GB rather than sys_mem into account.
- Modified the logic in
'testdata/cluster/node_templates/common/etc/hadoop/conf/yarn-site.xml.py'
Similarly, making IMPALA_CLUSTER_MAX_MEM_GB substitutes for sys_mem .
Testing: this patch worked in a 32GB docker container running on a 128GB
host machine. All 1188 unit tests get passed.
Change-Id: I8537fd748e279d5a0e689872aeb4dbfd0c84dc93
Reviewed-on: http://gerrit.cloudera.org:8080/16522
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
The data cache access trace was added in IMPALA-8542 as a way
to capture a workload's cache accesses to allow later analysis.
This modifies the data cache access trace to improve usability:
1. The access trace now uses a SimpleLogger to limit the total
number of trace entries per file and total number of trace
files. This caps the disk usage for the access trace. The
behavior is controlled by the data_cache_trace_dir,
max_data_cache_trace_file_size, and max_data_cache_trace_files
startup parameters.
2. This introduces the data_cache_trace_percentage, which allows
tracing only a subset of the entries produced. It traces
accesses for a consistent subset of the cache (i.e. accesses
for a filename/mtime/offset are either always traced or
never traced). This allows for better analysis than a random
sample. Tracing a subset of accesses can reduce any performance
overhead from tracing. It also provides a way to trace a longer
time period in the same number of entries.
This also implements the ability to replay traces against a
specific cache configuration. The replayer can produce JSON output
with cache hit/miss information for the original trace and the
replay. This provides a building block for building analysis
comparing different cache sizes or cache eviction policies.
Testing:
- New backend tests in data-cache-test, data-cache-trace-test
- Manually testing the data-cache-trace-replayer
Change-Id: I0f84204d8e5145f5fa8d4851d9c19ac317db168e
Reviewed-on: http://gerrit.cloudera.org:8080/15914
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This removes Impala-lzo from the Impala development environment.
Impala-lzo is not built as part of the Impala build. The LZO plugin
is no longer loaded. LZO tables are not loaded during dataload,
and LZO is no longer tested.
This removes some obsolete scan APIs that were only used by Impala-lzo.
With this commit, Impala-lzo would require code changes to build
against Impala.
The plugin infrastructure is not removed, and this leaves some
LZO support code in place. If someone were to decide to revive
Impala-lzo, they would still be able to load it as a plugin
and get the same functionality as before. This plugin support
may be removed later.
Testing:
- Dryrun of GVO
- Modified TestPartitionMetadataUncompressedTextOnly's
test_unsupported_text_compression() to add LZO case
Change-Id: I3a4f12247d8872b7e14c9feb4b2c58cfd60d4c0e
Reviewed-on: http://gerrit.cloudera.org:8080/15814
Reviewed-by: Bikramjeet Vig <bikramjeet.vig@cloudera.com>
Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>
Bump the disconnected_session_timeout to 6 hours in
./bin/start-impala-cluster.py.
This reduces test flakiness when running tests against the mini-cluster
using the hs2-http protocol. The issue is that a lot of the E2E tests
open a hs2-http connection on test startup, but might not use the
connection for a long time. The connection gets cleaned up and then
tests start to fail with "HiveServer2Error: Invalid session id"
exceptions.
The commonly happens in exhaustive tests where we add test dimensions on
the protocol used to execute E2E tests. This causes the test to switch
between the beeswax, hs2, and hs2-http protocols. If a test spends over
an hour using the beeswax protocol, the hs2-http will get closed.
Change-Id: I061a6f96311d406daee14454f71699c8727292d1
Reviewed-on: http://gerrit.cloudera.org:8080/16014
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
The kerberized minicluster is enabled by setting
IMPALA_KERBERIZE=true in impala-config-*.sh.
After setting it you must run ./bin/create-test-configuration.sh
then restart minicluster.
This adds a script to partially automate setup of a local KDC,
in lieu of the unmaintained minikdc support (which has been ripped
out).
Testing:
I was able to run some queries against pre-created HDFS tables
with kerberos enabled.
Change-Id: Ib34101d132e9c9d59da14537edf7d096f25e9bee
Reviewed-on: http://gerrit.cloudera.org:8080/15159
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
After upgrading Hive-3 to a version containing HIVE-22158, it's not
allowed for managed tables to be non transactional. Creating non ACID
tables will result in creating an external table with table property
'external.table.purge' set to true.
In Hive-3, the default location of external HDFS tables will be located
in 'metastore.warehouse.external.dir' if it's set. This property is
added by HIVE-19837 in Hive 2.7, but hasn't been added to Hive in cdh6
yet.
In CTAS statement, we create a temporary HMS Table for the analysis on
the Insert part. The table path is created assuming it's a managed
table, and the Insert part will use this path for insertion. However, in
Hive-3, the created table is translated to an external table. It's not
the same as we passed to the HMS API. The created table is located in
'metastore.warehouse.external.dir', while the table path we assumed is
in 'metastore.warehouse.dir'. This introduces bugs when these two
properties are different. CTAS statement will create table in one place
and insert data in another place.
This patch adds a new method in MetastoreShim to wrap the difference for
getting the default table path for non transactional tables between
Hive-2 and Hive-3.
Changes in the infra:
- To support customizing hive configuration, add an env var,
CUSTOM_CLASSPATH in bin/set-classpath.sh to be put in front of
existing CLASSPATH. The customized hive-site.xml should be put inside
CUSTOM_CLASSPATH.
- Change hive-site.xml.py to generate a hive-site.xml with non default
'metastore.warehouse.external.dir'
- Add an option, --env_vars, in bin/start-impala-cluster.py to pass
down CUSTOM_CLASSPATH.
Tests:
- Add a custom cluster test to start Hive with
metastore.warehouse.external.dir being set to non default value. Run
it locally using CDP components with HIVE-22158. xfail the test until
we bump CDP_BUILD_NUMBER to 1507246.
- Run CORE tests using CDH components
Change-Id: I460a57dc877ef68ad7dd0864a33b1599b1e9a8d9
Reviewed-on: http://gerrit.cloudera.org:8080/14527
Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
The catalogd process sometimes changes its name to "main"
after an ubuntu 16.04 update.
This avoids the issue by checking the first element of the
command line instead, which should reflect the binary
that was executed more reliably.
Testing:
This failed consistently before the change and now passes consistently
on my development machine.
Change-Id: Ib9396669481e4194beb6247c8d8b6064cb5119bb
Reviewed-on: http://gerrit.cloudera.org:8080/13971
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Tim Armstrong <tarmstrong@cloudera.com>
* Build scripts are generalised to have different targets for release
and debug images.
* Added new targets for the debug images: docker_debug_images,
statestored_debug images. The release images still have the
same names.
* Separate build contexts are set up for the different base
images.
* The debug or release base image can be specified as the FROM
for the daemon images.
* start-impala-cluster.py picks the correct images for the build type
Future work:
We would like to generalise this to allow building from
non-ubuntu-16.04 base images. This probably requires another
layer of dockerfiles to specify a base image for impala_base
with the required packages installed.
Change-Id: I32d2e19cb671beacceebb2642aba01191bd7a244
Reviewed-on: http://gerrit.cloudera.org:8080/13905
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This adds support for the data cache in dockerised clusters in
start-impala-cluster.py. It is handled similarly to the
log directories - we ensure that a separate data cache
directory is created for each container, then mount
it at /opt/impala/cache inside the container.
This is then enabled by default for the dockerised tests.
Testing:
Did a dockerised test run.
Change-Id: I2c75d4a5c1eea7a540d051bb175537163dec0e29
Reviewed-on: http://gerrit.cloudera.org:8080/13934
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This change adds support for running queries inside a single admission
control pool on one of several, disjoint sets of executors called
"executor groups".
Executors can be configured with an executor group through the newly
added '--executor_groups' flag. Note that in anticipation of future
changes, the flag already uses the plural form, but only a single
executor group may be specified for now. Each executor group
specification can optionally contain a minimum size, separated by a
':', e.g. --executor_groups default-pool-1:3. Only when the cluster
membership contains at least that number of executors for the groups
will it be considered for admission.
Executor groups are mapped to resource pools by their name: An executor
group can service queries from a resource pool if the pool name is a
prefix of the group name separated by a '-'. For example, queries in
poll poolA can be serviced by executor groups named poolA-1 and poolA-2,
but not by groups name foo or poolB-1.
During scheduling, executor groups are considered in alphabetical order.
This means that one group is filled up entirely before a subsequent
group is considered for admission. Groups also need to pass a health
check before considered. In particular, they must contain at least the
minimum number of executors specified.
If no group is specified during startup, executors are added to the
default executor group. If - during admission - no executor group for a
pool can be found and the default group is non-empty, then the default
group is considered. The default group does not have a minimum size.
This change inverts the order of scheduling and admission. Prior to this
change, queries were scheduled before submitting them to the admission
controller. Now the admission controller computes schedules for all
candidate executor groups before each admission attempt. If the cluster
membership has not changed, then the schedules of the previous attempt
will be reused. This means that queries will no longer fail if the
cluster membership changes while they are queued in the admission
controller.
This change also alters the default behavior when using a dedicated
coordinator and no executors have registered yet. Prior to this change,
a query would fail immediately with an error ("No executors registered
in group"). Now a query will get queued and wait until executors show
up, or it times out after the pools queue timeout period.
Testing:
This change adds a new custom cluster test for executor groups. It
makes use of new capabilities added to start-impala-cluster.py to bring
up additional executors into an already running cluster.
Additionally, this change adds an instructional implementation of
executor group based autoscaling, which can be used during development.
It also adds a helper to run queries concurrently. Both are used in a
new test to exercise the executor group logic and to prevent regressions
to these tools.
In addition to these tests, the existing tests for the admission
controller (both BE and EE tests) thoroughly exercise the changed code.
Some of them required changes themselves to reflect the new behavior.
I looped the new tests (test_executor_groups and test_auto_scaling) for
a night (110 iterations each) without any issues.
I also started an autoscaling cluster with a single group and ran
TPC-DS, TPC-H, and test_queries on it successfully.
Known limitations:
When using executor groups, only a single coordinator and a single AC
pool (i.e. the default pool) are supported. Executors to not include the
number of currently running queries in their statestore updates and so
admission controllers are not aware of the number of queries admitted by
other controllers per host.
Change-Id: I8a1d0900f2a82bd2fc0a906cc094e442cffa189b
Reviewed-on: http://gerrit.cloudera.org:8080/13550
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Prior to this change a dedicated coordinator would not create the
default executor group when registering its own backend descriptor in
the cluster membership. This caused a misleading error message during
scheduling when the default executor group could not be found.
To improve this, we now always create the default executor group and
return an improved error message if it is empty.
This change adds a test that validates that a query against a cluster
without executors returns the expected error.
Change-Id: Ia4428ef833363f52b14dfff253569212427a8e2f
Reviewed-on: http://gerrit.cloudera.org:8080/13866
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>