42 Commits

Author SHA1 Message Date
Riza Suminto
28cff4022d IMPALA-14333: Run impala-py.test using Python3
Running exhaustive tests with env var IMPALA_USE_PYTHON3_TESTS=true
reveals some tests that require adjustment. This patch made such
adjustment, which mostly revolves around encoding differences and string
vs bytes type in Python3. This patch also switch the default to run
pytest with Python3 by setting IMPALA_USE_PYTHON3_TESTS=true. The
following are the details:

Change hash() function in conftest.py to crc32() to produce
deterministic hash. Hash randomization is enabled by default since
Python 3.3 (see
https://docs.python.org/3/reference/datamodel.html#object.__hash__).
This cause test sharding (like --shard_tests=1/2) produce inconsistent
set of tests per shard. Always restart minicluster during custom cluster
tests if --shard_tests argument is set, because test order may change
and affect test correctness, depending on whether running on fresh
minicluster or not.

Moved one test case from delimited-latin-text.test to
test_delimited_text.py for easier binary comparison.

Add bytes_to_str() as a utility function to decode bytes in Python3.
This is often needed when inspecting the return value of
subprocess.check_output() as a string.

Implement DataTypeMetaclass.__lt__ to substitute
DataTypeMetaclass.__cmp__ that is ignored in Python3 (see
https://peps.python.org/pep-0207/).

Fix WEB_CERT_ERR difference in test_ipv6.py.

Fix trivial integer parsing in test_restart_services.py.

Fix various encoding issues in test_saml2_sso.py,
test_shell_commandline.py, and test_shell_interactive.py.

Change timeout in Impala.for_each_impalad() from sys.maxsize to 2^31-1.

Switch to binary comparison in test_iceberg.py where needed.

Specify text mode when calling tempfile.NamedTemporaryFile().

Simplify create_impala_shell_executable_dimension to skip testing dev
and python2 impala-shell when IMPALA_USE_PYTHON3_TESTS=true. The reason
is that several UTF-8 related tests in test_shell_commandline.py break
in Python3 pytest + Python2 impala-shell combo. This skipping already
happen automatically in build OS without system Python2 available like
RHEL9 (IMPALA_SYSTEM_PYTHON2 env var is empty).

Removed unused vector argument and fixed some trivial flake8 issues.

Several test logic require modification due to intermittent issue in
Python3 pytest. These include:

Add _run_query_with_client() in test_ranger.py to allow reusing a single
Impala client for running several queries. Ensure clients are closed
when the test is done. Mark several tests in test_ranger.py with

SkipIfFS.hive because they run queries through beeline + HiveServer2,
but Ozone and S3 build environment does not start HiveServer2 by
default.

Increase the sleep period from 0.1 to 0.5 seconds per iteration in
test_statestore.py and mark TestStatestore to execute serially. This is
because TServer appears to shut down more slowly when run concurrently
with other tests. Handle the deprecation of Thread.setDaemon() as well.

Always force_restart=True each test method in TestLoggingCore,
TestShellInteractiveReconnect, and TestQueryRetries to prevent them from
reusing minicluster from previous test method. Some of these tests
destruct minicluster (kill impalad) and will produce minidump if metrics
verifier for next tests fail to detect healthy minicluster state.

Testing:
Pass exhaustive tests with IMPALA_USE_PYTHON3_TESTS=true.

Change-Id: I401a93b6cc7bcd17f41d24e7a310e0c882a550d4
Reviewed-on: http://gerrit.cloudera.org:8080/23319
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-09-03 10:01:29 +00:00
Joe McDonnell
1d0b2ef0c5 IMPALA-14164: Fix timeout for fragments in flight in TestScratchDir
On release builds, some tests in TestScratchDir have started hitting
a timeout waiting for num-fragments-in-flight to reach 2. The code
to wait for the metric sleeps one second between samples. If one of
the query fragments starts and finishes during that second, the test
will never see a sample containing two in-flight fragments. This
happens on release builds because they are faster and more likely to
complete within that second.

This removes the code that waits for num-fragments-in-flight. All the
tests have subsequent calls waiting for the scratch usage to reach a
certain value. This will properly wait for the fragment to start up
on its own. The num-fragments-in-flight wait doesn't add anything.

Testing:
 - Ran custom_cluster/test_scratch_disk.py multiple times with a
   release build

Change-Id: Ic8c573affc033056ba465c42bd420d5c1d3ba15c
Reviewed-on: http://gerrit.cloudera.org:8080/23081
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-06-25 01:10:09 +00:00
Csaba Ringhofer
f98b697c7b IMPALA-13929: Make 'functional-query' the default workload in tests
This change adds get_workload() to ImpalaTestSuite and removes it
from all test suites that already returned 'functional-query'.
get_workload() is also removed from CustomClusterTestSuite which
used to return 'tpch'.

All other changes besides impala_test_suite.py and
custom_cluster_test_suite.py are just mass removals of
get_workload() functions.

The behavior is only changed in custom cluster tests that didn't
override get_workload(). By returning 'functional-query' instead
of 'tpch', exploration_strategy() will no longer return 'core' in
'exhaustive' test runs. See IMPALA-3947 on why workload affected
exploration_strategy. An example for affected test is
TestCatalogHMSFailures which was skipped both in core and exhaustive
runs before this change.

get_workload() functions that return a different workload than
'functional-query' are not changed - it is possible that some of
these also don't handle exploration_strategy() as expected, but
individually checking these tests is out of scope in this patch.

Change-Id: I9ec6c41ffb3a30e1ea2de773626d1485c69fe115
Reviewed-on: http://gerrit.cloudera.org:8080/22726
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
Reviewed-by: Daniel Becker <daniel.becker@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-04-08 07:12:55 +00:00
Riza Suminto
00dc79adf6 IMPALA-13907: Remove reference to create_beeswax_client
This patch replace create_beeswax_client() reference to
create_hs2_client() or vector-based client creation to prepare towards
hs2 test migration.

test_session_expiration_with_queued_query is changed to use impala.dbapi
directly from Impyla due to limitation in ImpylaHS2Connection.

TestAdmissionControllerRawHS2 is migrated to use hs2 as default test
protocol.

Modify test_query_expiration.py to set query option through client
instead of SET query. test_query_expiration is slightly modified due to
behavior difference in hs2 ImpylaHS2Connection.

Remove remaining reference to BeeswaxConnection.QueryState.

Fixed a bug in ImpylaHS2Connection.wait_for_finished_timeout().

Fix some easy flake8 issues caught thorugh this command:
git show HEAD --name-only | grep '^tests.*py' \
  | xargs -I {} impala-flake8 {} \
  | grep -e U100 -e E111 -e E301 -e E302 -e E303 -e F...

Testing:
- Pass exhaustive tests.

Change-Id: I1d84251835d458cc87fb8fedfc20ee15aae18d51
Reviewed-on: http://gerrit.cloudera.org:8080/22700
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-03-29 18:37:45 +00:00
Yida Wu
ddd4f4f8d6 IMPALA-13798: Cleanup host-level remote scratch dir on shutdown
IMPALA-13677 introduces cleanup of remote scratch files on startup,
and also introduces a host-level directory to do easier cleanup.

An empty host-level directory in remote storage does not use
resources, but it can stay there forever if a host goes offline
permanently or hostname changes. To improve this behavior, this
patch adds support for removing the remote host-level temporary
directory on shutdown. This helps prevent too many empty
directories left in the remote scratch path, especially for hosts
that never restart.

Changed the flag remote_scratch_cleanup_on_startup to
remote_scratch_cleanup_on_start_stop, so that this flag also
controls cleanup on shutdown.

Tests:
Passed exhaustive tests.
Added an e2e testcase
test_scratch_dirs_remote_dir_removal_on_shutdown.

Change-Id: Ic8f446894afdf975630aef80a9d964a9a78d3b46
Reviewed-on: http://gerrit.cloudera.org:8080/22549
Reviewed-by: Daniel Becker <daniel.becker@cloudera.com>
Reviewed-by: Abhishek Rawat <arawat@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-02-28 19:24:00 +00:00
Yida Wu
a159eb52f8 IMPALA-13677: Support remote scratch directory cleanup at Impala daemon startup
This patch introduces a new feature for cleaning up remote scratch
files during Impala daemon startup, ensuring that potential leftover
files from abnormal shutdowns are removed.

To allow efficient cleanup, this patch also refines the remote
scratch directory hierarchy by adding a host-level directory,
changing it from:
<base_dir>/<backend_id>_<query_id>/<file_name>
to:
<base_dir>/<hostname>/<backend_id>_<query_id>/<file_name>
<base_dir> is <scratch_dir_config_path>/impala-scratch.

During startup, if the host-level directory exists, it will be
removed entirely.

This design assumes one Impala daemon per host, and it also
assumes that multiple Impala clusters don't share the same
scratch_dir path on remote filesystem. Even if they share the
same prefix, each Impala cluster should have dedicated paths:
--scratch_dirs=hdfs://remote_dir/scratch/impala1
--scratch_dirs=hdfs://remote_dir/scratch/impala2

Also added one flag remote_scratch_cleanup_on_startup to control
whether the host-level directory is cleaned during Impala daemon
startup. By default, this feature is enabled. If multiple daemons
on a host or multiple clusters share the same remote scratch_dir
path, we can set this to false to prevent unintended cleanup.

Tests:
Passed exhaustive tests.
Adds testcase test_scratch_dirs_remote_spill_leftover_files_removal.

Change-Id: Iadd49b7384d52bac5ddab4e86cd9f39dc2c88e1b
Reviewed-on: http://gerrit.cloudera.org:8080/22378
Reviewed-by: Abhishek Rawat <arawat@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-01-29 05:40:10 +00:00
Riza Suminto
9c87cf41bf IMPALA-13396: Unify tmp dir management in CustomClusterTestSuite
There are many custom cluster tests that require creating temporary
directory. The temporary directory typically live within a scope of test
method and cleaned afterwards. However, some test do create temporary
directory directly and forgot to clean them afterwards, leaving junk
dirs under /tmp/ or $LOG_DIR.

This patch unify the temporary directory management inside
CustomClusterTestSuite. It introduce new 'tmp_dir_placeholders' arg in
CustomClusterTestSuite.with_args() that list tmp dirs to create.
'impalad_args', 'catalogd_args', and 'impala_log_dir' now accept
formatting pattern that is replaceable by a temporary dir path, defined
through 'tmp_dir_placeholders'.

There are few occurrences where mkdtemp is called and not replaceable by
this work, such as tests/comparison/cluster.py. In that case, this patch
change them to supply prefix arg so that developer knows that it comes
from Impala test script.

This patch also addressed several flake8 errors in modified files.

Testing:
- Pass custom cluster tests in exhaustive mode.
- Manually run few modified tests and observe that the temporary dirs
  are created and removed under logs/custom_cluster_tests/ as the tests
  go.

Change-Id: I8dd665e8028b3f03e5e33d572c5e188f85c3bdf5
Reviewed-on: http://gerrit.cloudera.org:8080/21836
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2024-10-02 01:25:39 +00:00
Yida Wu
99e8170997 IMPALA-12721: Fix flaky tests involving check_deleted_file_fd()
check_deleted_file_fd() is introduced in IMPALA-12681, however some
spilling testcases involving check_deleted_file_fd() seem flaky.

This patch fixed the issue by adding a retry mechanism within the
check_deleted_file_fd() function. If the function encounters a
failure, it retries the process of verifying the presence of a
deleted referencing file. Based on my local test, the file will be
removed after the test even when the test fails and the call to
delete the file handle is ahead of the call to remove the file (This
has been confirmed through additional testing logs). While there is
no theory why this would happen, introducing a retry mechanism has
allowed the test case to run successfully for 200 times without
encountering any failures. It is possible that a delay may be
occurring at some point in the process which leads to this kind of
failure.

Tests:
Reran the testcase 200 times without a failure.

Change-Id: I900aab7dc9833015ce140253ff40da28a6ed3ba6
Reviewed-on: http://gerrit.cloudera.org:8080/21000
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2024-02-06 21:53:23 +00:00
Yida Wu
02d004a121 IMPALA-12698: Restrict check_deleted_file_fd() for fixing flaky tests
The introduction of check_deleted_file_fd() in IMPALA-12681 aimed
to detect a bug related to remote spilling where local temporary file
handles were not being released after deletion. However, the tests
associated with this function seem flaky in exhaustive builds with
occasionally some files of hdfs may not be promptly released after
deletion, though locally, I observed that these files are eventually
removed from /proc/xx/fd in a few minutes, the reason is unclear
yet.

To fix the flaky build failure, this patch confines the scope of
check_deleted_file_fd() to detect files containing the keyword
"scratch" only. Given that hdfs files eventually get removed, and
it seems not an urgent issue, a separate Jira will be filed to track
and investigate this behavior further.

Testing:
Reran the tests a couple times and passed.

Change-Id: I55f5aa1cdbc0c74f6c7ebd25575e71d2b238bf98
Reviewed-on: http://gerrit.cloudera.org:8080/20898
Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2024-01-15 16:46:00 +00:00
Yida Wu
56a7514ba6 IMPALA-12681: Release file descriptors for partially written temporary files
This patch fixes a bug where partially written temporary files are
removed without releasing the file descriptors. This patch fixes
the bug by adding a call to Close() of the local file writer
during the Delete() of the DiskFile class, which could be called
when the local buffer file is being evicted or the query ends,
ensuring proper release of the file handle.

Testing:
Passed core tests.
Additionally, a check has been added in the test
test_scratch_disk.py to verify that there are no deleted
files in the /proc/x/fd/ directory.

Change-Id: I58a2bac419ced806d6f5a32bcdf24d79e078ab14
Reviewed-on: http://gerrit.cloudera.org:8080/20852
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2024-01-08 23:55:06 +00:00
Joe McDonnell
eb66d00f9f IMPALA-11974: Fix lazy list operators for Python 3 compatibility
Python 3 changes list operators such as range, map, and filter
to be lazy. Some code that expects the list operators to happen
immediately will fail. e.g.

Python 2:
range(0,5) == [0,1,2,3,4]
True

Python 3:
range(0,5) == [0,1,2,3,4]
False

The fix is to wrap locations with list(). i.e.

Python 3:
list(range(0,5)) == [0,1,2,3,4]
True

Since the base operators are now lazy, Python 3 also removes the
old lazy versions (e.g. xrange, ifilter, izip, etc). This uses
future's builtins package to convert the code to the Python 3
behavior (i.e. xrange -> future's builtins.range).

Most of the changes were done via these futurize fixes:
 - libfuturize.fixes.fix_xrange_with_import
 - lib2to3.fixes.fix_map
 - lib2to3.fixes.fix_filter

This eliminates the pylint warnings:
 - xrange-builtin
 - range-builtin-not-iterating
 - map-builtin-not-iterating
 - zip-builtin-not-iterating
 - filter-builtin-not-iterating
 - reduce-builtin
 - deprecated-itertools-function

Testing:
 - Ran core job

Change-Id: Ic7c082711f8eff451a1b5c085e97461c327edb5f
Reviewed-on: http://gerrit.cloudera.org:8080/19589
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>
2023-03-09 17:17:57 +00:00
Joe McDonnell
82bd087fb1 IMPALA-11973: Add absolute_import, division to all eligible Python files
This takes steps to make Python 2 behave like Python 3 as
a way to flush out issues with running on Python 3. Specifically,
it handles two main differences:
 1. Python 3 requires absolute imports within packages. This
    can be emulated via "from __future__ import absolute_import"
 2. Python 3 changed division to "true" division that doesn't
    round to an integer. This can be emulated via
    "from __future__ import division"

This changes all Python files to add imports for absolute_import
and division. For completeness, this also includes print_function in the
import.

I scrutinized each old-division location and converted some locations
to use the integer division '//' operator if it needed an integer
result (e.g. for indices, counts of records, etc). Some code was also using
relative imports and needed to be adjusted to handle absolute_import.
This fixes all Pylint warnings about no-absolute-import and old-division,
and these warnings are now banned.

Testing:
 - Ran core tests

Change-Id: Idb0fcbd11f3e8791f5951c4944be44fb580e576b
Reviewed-on: http://gerrit.cloudera.org:8080/19588
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>
2023-03-09 17:17:57 +00:00
Joe McDonnell
2b550634d2 IMPALA-11952 (part 2): Fix print function syntax
Python 3 now treats print as a function and requires
the parenthesis in invocation.

print "Hello World!"
is now:
print("Hello World!")

This fixes all locations to use the function
invocation. This is more complicated when the output
is being redirected to a file or when avoiding the
usual newline.

print >> sys.stderr , "Hello World!"
is now:
print("Hello World!", file=sys.stderr)

To support this properly and guarantee equivalent behavior
between python 2 and python 3, all files that use print
now add this import:
from __future__ import print_function

This also fixes random flake8 issues that intersect with
the changes.

Testing:
 - check-python-syntax.sh shows no errors related to print

Change-Id: Ib634958369ad777a41e72d80c8053b74384ac351
Reviewed-on: http://gerrit.cloudera.org:8080/19552
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Michael Smith <michael.smith@cloudera.com>
2023-02-28 17:11:50 +00:00
Michael Smith
a469a9cf19 IMPALA-11730: Add support for spilling to Ozone
Adds support for spilling to Ozone (ofs and o3fs schemes) for parity
with HDFS. Note that ofs paths start with <volume>/<bucket>, which have
naming restrictions; tmp/impala-scratch is a valid name, so something
like ofs://localhost:9862/tmp would work as a scratch directory (volume
tmp, implicit bucket impala-scratch).

Updates tests to determine the correct path from the environment. Fixes
backend tests to work with Ozone as well. Guards test_scratch_disk.py
behind a new flag for filesystems that support spilling. Updates metric
verification to wait for scratch-space-bytes-used to be non-zero, as it
seems to update slower with Ozone.

Refactors TmpDir to remove extraneous variables and functions. Each
implementation is expected to handle its own token parsing.

Initializes default_fs in ExecEnv when using TestEnv. Previously it was
uninitialized, and uses of default_fs would return an empty string.

Testing:
- Ran backend, end-to-end, and custom cluster tests with Ozone.
- Ran test_scratch_disk.py exhaustive runs with Ozone and HDFS.

Change-Id: I5837c30357363f727ca832fb94169f2474fb4f6f
Reviewed-on: http://gerrit.cloudera.org:8080/19251
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2022-12-08 17:20:33 +00:00
Michael Smith
f8443d9828 IMPALA-11697: Enable SkipIf.not_hdfs tests for Ozone
Convert SkipIf.not_hdfs to SkipIf.not_dfs for tests that require
filesystem semantics, adding more feature test coverage with Ozone.

Creates a separate not_scratch_fs flag for scratch dir tests as they're
not supported with Ozone yet. Filed IMPALA-11730 to address this.

Preserves not_hdfs for a specific test that uses the dfsadmin CLI to put
it in safemode.

Adds sfs_ofs_unsupported for SmallFileSystem tests. This should work for
many of our filesystems based on
ebb1e2fa99/ql/src/java/org/apache/hadoop/hive/ql/io/SingleFileSystem.java (L62-L87). Makes sfs tests work on S3.

Adds hardcoded_uris for IcebergV2 tests where deletes are implemented as
hardcoded URIs in parquet files. Adding a parquet read/write library for
Python is beyond the scope if this patch.

Change-Id: Iafc1dac52d013e74a459fdc4336c26891a256ef1
Reviewed-on: http://gerrit.cloudera.org:8080/19254
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
2022-11-21 18:51:30 +00:00
Yida Wu
6dfab93fe9 IMPALA-10791 Add batch reading for remote temporary files
The patch adds a feature to batch read from a remote temporary
file in order to improve the reading performance for the spilled
remote data.

Originally, the design is to use the local disk file as the buffer
for batch read from the remote file. But in practice, it
doesn't help to improve the performance. Therefore, the design
is changed to use the memory as the read buffer.

Currently, each TmpFileRemote has two DiskFile, one is for the
remote, and one is for the local buffer. The patch adds MemBlocks
to the local buffer file. Each local buffer file is divided into
several MemBlocks evenly. Moreover, in order to guarantee a
single page not being cut into two parts in different blocks,
the block size could be a little different to each other in
practice. The default block size is the minimum value between
the default file size and
MAX_REMOTE_READ_MEM_BLOCK_THRESHOLD_BYTES, which is 16MB.

When pinning a page, the system will detect if there is enough
memory for the block that holds the page. If yes, the block will
be stored in the memory until all of the pages in the block are
read or the query ends. If not, we will go reading the page
directly and disable this block, because it may be good to avoid
duplicated reads from the remote fs for the same content.

One challenge of the read buffer is where to get the extra memory
for it, because when impala starts to spill data, it means the
process lacks of memory to use. By default, impala process will
reserve 20% of the total system memory as unused memory, and here
we will use this unused memory for the read buffer because it is
reasonable to use it for the emergency case like spilling and
the memory of the read buffer will be returned immediately after
the use.

For system reliability consideration, we set a restriction that,
the maximum bytes of the read buffer memory are no more than 10%
of the total system memory and 50% of the unused memory. Also,
if the unused memory is less than 5% of the total system memory,
the read buffer will be disabled.

Two start options have been added for the new feature.

1. remote_batch_read. Default is false. If set true, the batch read
is enabled.
2. remote_read_memory_buffer_size. Default is 1G. The maximum memory
that can be used by the read buffer. The number is also restricted
by the process memory limit, which can not exceed 10% of the process
memory limit.

Added metrics ScratchReadsUseMem/ScratchBytesReadUseMem/
ScratchBytesReadUseLocalDisk to the query profile.

The patch also increases the MAX_REMOTE_TMPFILE_SIZE_THRESHOLD_MB
from 256 to 512.

Tests:
Ran core and exhaustive tests.
Added and ran TmpFileMgrTest::TestBatchReadingFromRemote.
Added e2e test test_scratch_dirs_batch_reading.

Change-Id: I1dcc5d0881ffaeff09c5c514306cd668373ad31b
Reviewed-on: http://gerrit.cloudera.org:8080/17979
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2022-09-28 09:02:08 +00:00
baggio000
1a3ff11d82 IMPALA-10429 Add support for specifying HDFS path in 'scratch_dirs' startup option
We support the HDFS scratch space, but as a test-only feature with
a fixed HDFS default local path.

In this patch, we extend the HDFS scratch space to support the
customer's input. For supporting the function, we add a new
format for HDFS scratch space path. It forces the HDFS path
to have the port number to solve the contradiction to the
current format of the scratch space path.

For example, previously, the format for scratch space path is,
take s3 for example, s3a://bucketpath:#bytes:#priority. In this
case, the bucketpath doesn't have a port number.

In this patch, the new format of HDFS scratch path is
hdfs://ipaddr:#port:#bytes:#priority. The port number is required,
therefore, there must be at least one colon in the HDFS path, the
bytes and priority are optional as before. For other scratch
spaces, the path format doesn’t change.

Also, option allow_spill_to_hdfs is removed because the spilling
to HDFS is not a test-only function anymore, as a result, the e2e
tests involved are updated.

Tests:
Added and passed TmpFileMgrTest::TestDirectoryLimitParsingRemotePath.
Ran the Core tests.

Change-Id: I0882ed1e80b02724dd5cb3cdb1fa7b6c2debcbf4
Reviewed-on: http://gerrit.cloudera.org:8080/17720
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2021-08-17 01:10:36 +00:00
Bikramjeet Vig
06c9016a37 IMPALA-8762: Track host level admission stats across all coordinators
This patch adds the ability to share the per-host stats for locally
admitted queries across all coordinators. This helps to get a more
consolidated view of the cluster for stats like slots_in_use and
mem_admitted when making local admission decisions.

Testing:
Added e2e py test

Change-Id: I2946832e0a89b077d0f3bec755e4672be2088243
Reviewed-on: http://gerrit.cloudera.org:8080/17683
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2021-07-28 05:33:16 +00:00
Riza Suminto
47219ec366 IMPALA-10565: Adjust result spooling memory based on scratch_limit
IMPALA-9856 enables result spooling by default. Result spooling depends
on the ability to spill its entire BufferedTupleStream to disk once it
hits maximum memory reservation. However, if the query option
scratch_limit is set lower than max_spilled_result_spooling_mem, the
query might fail in the middle of execution due to insufficient scratch
space. This patch adds planner change to consider scratch_limit and
scratch_dirs query option when computing resource used by result
spooling. The algorithm is as follow:

* If scratch_dirs is empty or scratch_limit < minMemReservationBytes
  required to use BufferedPlanRootSink, we set spool_query_results to
  false and fallback to use BlockingPlanRootSink.

* If scratch_limit > minMemReservationBytes but still fairly low, we
  lower the max_result_spooling_mem (default is 100MB) and
  max_spilled_result_spooling_mem (default is 1GB) to fit scratch_limit.

* if scratch_limit > max_spilled_result_spooling_mem, do nothing.

Testing:
- Add TestScratchLimit::test_result_spooling_and_varying_scratch_limit
- Verify that spool_query_results query option is disabled in
  TestScratchDir::test_no_dirs
- Pass exhaustive tests.

Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9
Reviewed-on: http://gerrit.cloudera.org:8080/17166
Reviewed-by: Aman Sinha <amsinha@cloudera.com>
Tested-by: Aman Sinha <amsinha@cloudera.com>
2021-03-14 03:35:40 +00:00
Riza Suminto
49ac55fb69 IMPALA-9856: Enable result spooling by default.
Result spooling has been relatively stable since it was introduced, and
it has several benefits described in IMPALA-8656. This patch enable
result spooling (SPOOL_QUERY_RESULTS) query options by default.

Furthermore, some tests need to be adjusted to account for result
spooling by default. The following are the adjustment categories and
list of tests that fall under such category.

Change in assertions:
PlannerTest#testAcidTableScans
PlannerTest#testBloomFilterAssignment
PlannerTest#testConstantFolding
PlannerTest#testFkPkJoinDetection
PlannerTest#testFkPkJoinDetectionWithHDFSNumRowsEstDisabled
PlannerTest#testKuduSelectivity
PlannerTest#testMaxRowSize
PlannerTest#testMinMaxRuntimeFilters
PlannerTest#testMinMaxRuntimeFiltersWithHDFSNumRowsEstDisabled
PlannerTest#testMtDopValidation
PlannerTest#testParquetFiltering
PlannerTest#testParquetFilteringDisabled
PlannerTest#testPartitionPruning
PlannerTest#testPreaggBytesLimit
PlannerTest#testResourceRequirements
PlannerTest#testRuntimeFilterQueryOptions
PlannerTest#testSortExprMaterialization
PlannerTest#testSpillableBufferSizing
PlannerTest#testTableSample
PlannerTest#testTpch
PlannerTest#testKuduTpch
PlannerTest#testTpchNested
PlannerTest#testUnion
TpcdsPlannerTest
custom_cluster/test_admission_controller.py::TestAdmissionController::test_dedicated_coordinator_planner_estimates
custom_cluster/test_admission_controller.py::TestAdmissionController::test_memory_rejection
custom_cluster/test_admission_controller.py::TestAdmissionController::test_pool_mem_limit_configs
metadata/test_explain.py::TestExplain::test_explain_level2
metadata/test_explain.py::TestExplain::test_explain_level3
metadata/test_stats_extrapolation.py::TestStatsExtrapolation::test_stats_extrapolation

Increase BUFFER_POOL_LIMIT:
query_test/test_queries.py::TestQueries::test_analytic_fns
query_test/test_runtime_filters.py::TestRuntimeRowFilters::test_row_filter_reservation
query_test/test_sort.py::TestQueryFullSort::test_multiple_mem_limits_full_output
query_test/test_spilling.py::TestSpillingBroadcastJoins::test_spilling_broadcast_joins
query_test/test_spilling.py::TestSpillingDebugActionDimensions::test_spilling_aggs
query_test/test_spilling.py::TestSpillingDebugActionDimensions::test_spilling_regression_exhaustive
query_test/test_udfs.py::TestUdfExecution::test_mem_limits

Increase MEM_LIMIT:
query_test/test_mem_usage_scaling.py::TestExchangeMemUsage::test_exchange_mem_usage_scaling
query_test/test_mem_usage_scaling.py::TestScanMemLimit::test_hdfs_scanner_thread_mem_scaling

Increase MAX_ROW_SIZE:
custom_cluster/test_parquet_max_page_header.py::TestParquetMaxPageHeader::test_large_page_header_config
query_test/test_insert.py::TestInsertQueries::test_insert_large_string
query_test/test_query_mem_limit.py::TestQueryMemLimit::test_mem_limit
query_test/test_scanners.py::TestTextSplitDelimiters::test_text_split_across_buffers_delimiter
query_test/test_scanners.py::TestWideRow::test_wide_row

Disable result spooling to maintain assertion:
custom_cluster/test_admission_controller.py::TestAdmissionController::test_set_request_pool
custom_cluster/test_admission_controller.py::TestAdmissionController::test_timeout_reason_host_memory
custom_cluster/test_admission_controller.py::TestAdmissionController::test_timeout_reason_pool_memory
custom_cluster/test_admission_controller.py::TestAdmissionController::test_queue_reasons_memory
custom_cluster/test_admission_controller.py::TestAdmissionController::test_pool_config_change_while_queued
custom_cluster/test_query_retries.py::TestQueryRetries::test_retry_fetched_rows
custom_cluster/test_query_retries.py::TestQueryRetries::test_retry_finished_query
custom_cluster/test_scratch_disk.py::TestScratchDir::test_no_dirs
custom_cluster/test_scratch_disk.py::TestScratchDir::test_non_existing_dirs
custom_cluster/test_scratch_disk.py::TestScratchDir::test_non_writable_dirs
query_test/test_insert.py::TestInsertQueries::test_insert_large_string (the last query only)
query_test/test_kudu.py::TestKuduMemLimits::test_low_mem_limit_low_selectivity_scan
query_test/test_mem_usage_scaling.py::TestScanMemLimit::test_kudu_scan_mem_usage
query_test/test_queries.py::TestQueriesParquetTables::test_very_large_strings
query_test/test_query_mem_limit.py::TestCodegenMemLimit::test_codegen_mem_limit
shell/test_shell_client.py::TestShellClient::test_fetch_size

Testing:
- Pass exhaustive tests.

Change-Id: I9e360c1428676d8f3fab5d95efee18aca085eba4
Reviewed-on: http://gerrit.cloudera.org:8080/16755
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2021-03-02 04:58:51 +00:00
Yida Wu
bead2ede1c IMPALA-10533: Fix TestScratchDir.test_scratch_dirs_mix_local_and_remote_dir_spill_local_only seems flaky
The E2E testcase emulates the situation when there are two types of
scratch directories, the data only spills to the local one when the
space of local directory is sufficient. The testcase works fine for
the debug build, however in the release build, the system runs faster
and more data is spilled from memory which exceeds the setting of the
local scratch space limit. To solve this, the size limit of local
scratch space is changed from 100M to 2GB, so that allows all of the
spilled data is in the local instead of the remote directory.

Tests:
Reran test_scratch_dirs_mix_local_and_remote_dir_spill_local_only in
the release build.

Change-Id: If2dc32196b2554aee9fc94a4ccbbf5803dbcce1d
Reviewed-on: http://gerrit.cloudera.org:8080/17102
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2021-02-25 07:50:36 +00:00
Yida Wu
85e39e170f IMPALA-9867: Add Support for Spilling to S3: Milestone 1
Major Features
1) Local files as buffers for spilling to S3.
2) Async Upload for remote files.
3) Sync remote files deletion after query ends.
4) Local buffer files management.
5) Compatibility of spilling to local and remote.
6) All the errors from hdfs/s3 should terminate the query.

Changes on TmpFile:
* TmpFile is separated into two types of implementation, TmpFileLocal
  and TmpFileRemote.
  TmpFileLocal is used for Spilling to local file system.
  TmpFileRemote is a new type for Spilling to the remote. It contains
  two DiskFiles, one for local buffer, the other for the remote file.
* The DiskFile is an object that contains the information of a pysical
  file for passing to the DiskIOMgr to execute the IO operations on
  that specific file. The DiskFile also contains status information of
  the file,includes DiskFileStatus::INWRITING/PERSISTED/DELETED.
  When the DiskFile is initialized, it is in INWRITING status. If the
  file is persisted into the file system, it would become PERSISTED
  status. If the file is deleted, for example, the local buffer is
  evicted, so the DiskFile status of the buffer file would become
  deleted. After that, if the file is fetching from the remote, the
  DiskFile status of the buffer file would become INWRITING, and then
  PERSISTED if the fetching finishes successfully.

Implementation Details:
1) A new enum type is added to specify the disk type of files,
   indicating where the file physically locates.
   The types include DiskFileType::LOCAL/LOCAL_BUFFER/DFS/S3.
   DiskFileType::LOCAL indicates the file is in the local file system.
   DiskFileType::LOCAL_BUFFER indicates the file is in the local file
   system, and it is the buffer of a remote scratch file.
   DiskFileType::DFS/S3 indicates the file is in the HDFS/S3.
   The local buffer allows the buffer pool to pin(read), but mainly
   for remote files, buffer pool would pin(read) the page from the
   remote file system.
2) Two disk queues have been added to do the file operation jobs.
   Queue name: RemoteS3DiskFileOper/RemoteDfsDiskFileOper
   File operations on the remote disk like upload and fetch should
   be done in these queues. The purpose of the queues is to isolate
   the file operations from normal read/write IO operations in different
   queues. It could increase the efficiency of the file operations by
   not being interrupted during a relatively long execution time, and
   also provide a more accurate control on the thread number working on
   file operation jobs.
   RemoteOperRange is the new type to carry the file operation jobs.
   Previously,we have request types of READ and WRITE.
   Now FILE_FETCH/FILE_UPLOAD are added.
3) The tmp files are physically deleted when the tmp file group is
   deconstructing. For remote files, the entire directory would be
   deleted.
4) The local buffer files management is to control the total size
   of local buffer files and evict files if needed.
   A local buffer file can be evicted if the temporary file has uploaded
   a copy to the remote disk or the query ends.
   There are two modes to decide the sequence of choosing files to be
   evicted first. Default is LIFO, the other is FIFO. It can be
   controlled by startup option remote_tmp_files_avail_pool_lifo.
   Also, a thread TmpFileSpaceReserveThreadLoop in TmpFileMgr is
   running to allow to reserve buffer file space in an async way to
   avoid deadlocks.
   Startup option allow_spill_to_hdfs is added. By default the HDFS path
   is not allowed, but for testcases, the option can be set true to
   allow the use of HDFS path as scratch space for testing only.
   Startup option wait_for_spill_buffer_timeout_s is added to control
   the maximum duration waiting for the buffer in the TmpFileBufferPool.
   Default value is 60, stands for 60 seconds.
5) Spilling to local has higher priority than spilling to remote.
   If no local scratch space is available, temporary data will be
   spilled to remote.
   The first available local directory is used for the local buffer
   for spilling to remote if any remote directory is configured.
   If remote directory is configured without any available local
   scratch space, an error will be returned during initialization.
   The purpose of the design is to simplify the implementation in
   milestone 1 with less changes on the configuration.

Example (setting remote scratch space):
Assume that the directories we have for scratch space:
* Local dir: /tmp/local_buffer, /tmp/local, /tmp/local_sec
* Remote dir: s3a://tmp/remote
The scratch space path is configured in the startup options, and could
have three types of configurations:
1. Pure local scratch space
  --scratch_dirs="/tmp/local"
2. Pure remote scratch space
  --scratch_dirs="s3a://tmp/remote,/tmp/local_buffer:16GB"
3. Mixed local and remote scratch space
  --scratch_dirs="s3a://tmp/romote:200GB,/tmp/local_buffer:1GB,
/tmp/local:2GB, /tmp/local_sec:16GB"
* Type 1: a pure local scratch space with unlimited size.
* Type 2: a pure remote scratch space with a 16GB local buffer.
* Type 3: a mixed local and remote scratch space, the size of the local
buffer for the remote directory is 1GB, while local scratch spaces are
2GB and 16GB, remote scratch space bytes limit is 200GB. Remote scratch
space is used only when all of the local spaces are at capacity.
* Note: The first local directory would be the local buffer path, if a
remote scratch space is registered.

Limitations:
* Only one remote scratch dir is supported.
* The first local scratch dir is used for the buffer of remote scratch
  space if remote scratch dir exists.

Testcases:
* Ran pre-review-test
* Unit Tests added to
  tmp-file-mgr-test/disk-io-mgr-test/buffer-pool-test.
* E2E Tests added to custom_cluster/test_scratch_disk.py.
* Ran Unit Tests:
$IMPALA_HOME/be/build/debug/runtime/buffered-tuple-stream-test
$IMPALA_HOME/be/build/debug/runtime/tmp-file-mgr-test
$IMPALA_HOME/be/build/debug/runtime/bufferpool/buffer-pool-test
$IMPALA_HOME/be/build/debug/runtime/io/disk-io-mgr-test
* Ran E2E Tests:
custom_cluster/test_scratch_disk.py

Change-Id: I419b1d5dbbfe35334d9f964c4b65e553579fdc89
Reviewed-on: http://gerrit.cloudera.org:8080/16318
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2021-02-19 23:54:44 +00:00
Abhishek Rawat
3b9ae415e2 IMPALA-9697: Support priority based scratch directory selection
The '--scratch_dirs' configuration option now supports specifying the
priority of the scratch direcotry. The lower the numeric value, the
higher is the priority. If priority is not specified then default
priority with value numeric_limits<int>::max() is used.

Valid formats for specifying the priority are:
- <dir-path>:<limit>:<priority>
- <dir-path>::<priority>
Following formats use default priority:
- <dir-path>
- <dir-path>:<limit>
- <dir-path>:<limit>:

The new logic in TmpFileGroup::AllocateSpace() tries to find a target
file using a prioritized round-robin scheme. Files are ordered in
decreasing order of their priority. The priority of a file is same as
the priority of the related directory. A target file is selected by
always searching in the ordered list starting from the file with highest
priority. If multiple files have same priority, then the target file is
selected in a round robin manner.

Testing:
- Added unit and e2e tests for priority based spilling logic.

Change-Id: I381c3a358e1382e6696325fec74667f1fa18dd17
Reviewed-on: http://gerrit.cloudera.org:8080/16091
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-06-26 10:20:41 +00:00
Tim Armstrong
ebbe52b4be IMPALA-3766: optionally compress spilled data
Enabled via --disk_spill_compression_codec, which uses
the same syntax as the compression_codec query option.
Recommended codecs are LZ4 and ZSTD. ZSTD supports
specifying a compression level.

The compression is done in TmpFileMgr using a temporary
buffer. Allocation of disk space is reworked slightly
so that the allocation can happen after compression.

The default power-of-two disk block sizes would lead
to a lot of internal fragmentation, so a new strategy
for free space management, similar to that used in
the data cache, can be used with
--disk_spill_punch_holes=true. TmpFileMgr will allocate
a range of the actual compressed size and punch holes
in the file for each range that is no longer needed.

UncompressedWriteIoBytes is added to the buffer pool
profiles, so that you can see what degree of compression
is achieved. Typically I saw ratios of 2-3x for LZ4 and
ZSTD (with LZ4 toward the lower end and ZSTD toward
the higher end).

Limitations:
The management of the compression buffer memory could
be improved. Ideally it would be integrated with the
buffer pool and use the buffer pool allocator instead
of being done "on the side". We would probably want to
do this before making this the default, for resource
management and performance reasons (doing a malloc()
directly does not use the caching supported by the
buffer pool).

Testing:
* Run buffer pool spilling tests with different combinations of
  the new options.
* Extend existing TmpFileMgr tests for file space allocation to
  run with hole punching enabled.
* Switch a couple of spilling tests to use the new option.
* Add a metrics test to check for scratch leaks.
* Enable the new options by default for end-to-end dockerized
  tests to get additional coverage.
* Add a unit test where allocating compression memory fails,
  both on the read and write path.
* Ran a single-node stress test on TPC-DS SF 1 and TPC-H SF 10
  The peak compression buffer usage was ~40MB.

Perf:
I ran this spilling query using an SSD as the scratch disk:

  set mem_limit=200m;
  select count(distinct l_partkey) from
  tpch30_parquet.lineitem;

The time taken for the second run of each query was:
No compression: 19.59s
LZ4: 18.56s
ZSTD: 20.59s

Change-Id: I9c08ff9504097f0fee8c32316c5c150136abe659
Reviewed-on: http://gerrit.cloudera.org:8080/15454
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Bikramjeet Vig <bikramjeet.vig@cloudera.com>
2020-03-31 01:36:44 +00:00
Tim Armstrong
4fb8e8e324 IMPALA-8816: reduce custom cluster test runtime in core
This includes some optimisations and a bulk move of tests
to exhaustive.

Move a bunch of custom cluster tests to exhaustive. I selected
these partially based on runtime (i.e. I looked most carefully
at the tests that ran for over a minute) and the likelihood
of them catching a precommit bug.  Regression tests for specific
edge cases and tests for parts of the code that are very stable
were prime candidates.

Remove an unnecessary cluster restart in test_breakpad.

Merge test_scheduler_error into test_failpoints to avoid an unnecessary
cluster restart.

Speed up cluster starts by ensuring that the default statestore args are
applied even when _start_impala_cluster() is called directly. This
shaves a couple of seconds off each restart. We made the default args
use a faster update frequency - see IMPALA-7185 - but they did not
take effect in all tests.

Change-Id: Ib2e3e7ebc9695baec4d69183387259958df10f62
Reviewed-on: http://gerrit.cloudera.org:8080/13967
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-08-06 21:34:26 +00:00
Tim Armstrong
f9ced753ba IMPALA-7999: clean up start-*d.sh scripts
Delete these wrapper scripts and replace with a generic
start-daemon.sh script that sets environment variables
without the other logic.

Move the logic for setting JAVA_TOOL_OPTIONS into
start-impala-cluster.py.

Remove some options like -jvm_suspend, -gdb, -perf that
may not be used. These can be reintroduced if needed.

Port across the kerberized minicluster logic (which has
probably bitrotted) in case it needs to be revived.

Remove --verbose option that didn't appear to be useful
(it claims to print daemon output to the console,
but output is still redirected regardless).

Removed a level of quoting in custom cluster test argument
handling - this was made unnecessary by properly escaping
arguments with pipes.escape() in run_daemon().

Testing:
* Ran exhaustive tests.
* Ran on CentOS 6 to confirm we didn't reintroduce Popen issue
  worked around by kwho.

Change-Id: Ib67444fd4def8da119db5d3a0832ef1de15b068b
Reviewed-on: http://gerrit.cloudera.org:8080/12271
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-02-05 13:10:08 +00:00
Tim Armstrong
3e4015c601 IMPALA-7179: allow_multiple_scratch_dirs_per_device=true by default
The previous default was often confusing to users of Impala. It is
simpler to do exactly what is asked instead of trying to fix bad
configurations automatically.

Testing:
Ran core tests.

Change-Id: I23394c9949ae4cd0a21d7bb25551371b3198e76c
Reviewed-on: http://gerrit.cloudera.org:8080/10736
Reviewed-by: anujphadke <aphadke@cloudera.com>
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2018-06-16 07:51:42 +00:00
Tim Armstrong
fb5dc9eb48 IMPALA-4835: switch I/O buffers to buffer pool
This is the following squashed patches that were reverted.

I will fix the known issues with some follow-on patches.

======================================================================
IMPALA-4835: Part 1: simplify I/O mgr mem mgmt and cancellation

In preparation for switching the I/O mgr to the buffer pool, this
removes and cleans up a lot of code so that the switchover patch starts
from a cleaner slate.

* Remove the free buffer cache (which will be replaced by buffer pool's
  own caching).
* Make memory limit exceeded error checking synchronous (in anticipation
  of having to propagate buffer pool errors synchronously).
* Simplify error propagation - remove the (ineffectual) code that
  enqueued BufferDescriptors containing error statuses.
* Document locking scheme better in a few places, make it part of the
  function signature when it seemed reasonable.
* Move ReturnBuffer() to ScanRange, because it is intrinsically
  connected with the lifecycle of a scan range.
* Separate external ReturnBuffer() and internal CleanUpBuffer()
  interfaces - previously callers of ReturnBuffer() were fudging
  the num_buffers_in_reader accounting to make the external interface work.
* Eliminate redundant state in ScanRange: 'eosr_returned_' and
  'is_cancelled_'.
* Clarify the logic around calling Close() for the last
  BufferDescriptor.
  -> There appeared to be an implicit assumption that buffers would be
     freed in the order they were returned from the scan range, so that
     the "eos" buffer was returned last. Instead just count the number
     of outstanding buffers to detect the last one.
  -> Touching the is_cancelled_ field without holding a lock was hard to
     reason about - violated locking rules and it was unclear that it
     was race-free.
* Remove DiskIoMgr::Read() to simplify the interface. It is trivial to
  inline at the callsites.

This will probably regress performance somewhat because of the cache
removal, so my plan is to merge it around the same time as switching
the I/O mgr to allocate from the buffer pool. I'm keeping the patches
separate to make reviewing easier.

Testing:
* Ran exhaustive tests
* Ran the disk-io-mgr-stress-test overnight

======================================================================
IMPALA-4835: Part 2: Allocate scan range buffers upfront

This change is a step towards reserving memory for buffers from the
buffer pool and constraining per-scanner memory requirements. This
change restructures the DiskIoMgr code so that each ScanRange operates
with a fixed set of buffers that are allocated upfront and recycled as
the I/O mgr works through the ScanRange.

One major change is that ScanRanges get blocked when a buffer is not
available and get unblocked when a client returns a buffer via
ReturnBuffer(). I was able to remove the logic to maintain the
blocked_ranges_ list by instead adding a separate set with all ranges
that are active.

There is also some miscellaneous cleanup included - e.g. reducing the
amount of code devoted to maintaining counters and metrics.

One tricky part of the existing code was the it called
IssueInitialRanges() with empty lists of files and depended on
DiskIoMgr::AddScanRanges() to not check for cancellation in that case.
See IMPALA-6564/IMPALA-6588. I changed the logic to not try to issue
ranges for empty lists of files.

I plan to merge this along with the actual buffer pool switch, but
separated it out to allow review of the DiskIoMgr changes separate from
other aspects of the buffer pool switchover.

Testing:
* Ran core and exhaustive tests.

======================================================================
IMPALA-4835: Part 3: switch I/O buffers to buffer pool

This is the final patch to switch the Disk I/O manager to allocate all
buffer from the buffer pool and to reserve the buffers required for
a query upfront.

* The planner reserves enough memory to run a single scanner per
  scan node.
* The multi-threaded scan node must increase reservation before
  spinning up more threads.
* The scanner implementations must be careful to stay within their
  assigned reservation.

The row-oriented scanners were most straightforward, since they only
have a single scan range active at a time. A single I/O buffer is
sufficient to scan the whole file but more I/O buffers can improve I/O
throughput.

Parquet is more complex because it issues a scan range per column and
the sizes of the columns on disk are not known during planning. To
deal with this, the reservation in the frontend is based on a
heuristic involving the file size and # columns. The Parquet scanner
can then divvy up reservation to columns based on the size of column
data on disk.

I adjusted how the 'mem_limit' is divided between buffer pool and non
buffer pool memory for low mem_limits to account for the increase in
buffer pool memory.

Testing:
* Added more planner tests to cover reservation calcs for scan node.
* Test scanners for all file formats with the reservation denial debug
  action, to test behaviour when the scanners hit reservation limits.
* Updated memory and buffer pool limits for tests.
* Added unit tests for dividing reservation between columns in parquet,
  since the algorithm is non-trivial.

Perf:
I ran TPC-H and targeted perf locally comparing with master. Both
showed small improvements of a few percent and no regressions of
note. Cluster perf tests showed no significant change.

Change-Id: I3ef471dc0746f0ab93b572c34024fc7343161f00
Reviewed-on: http://gerrit.cloudera.org:8080/9679
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Tim Armstrong <tarmstrong@cloudera.com>
2018-04-28 23:41:39 +00:00
Tim Armstrong
161cbe30ff Revert IMPALA-4835 and dependent changes
Revert "IMPALA-6585: increase test_low_mem_limit_q21 limit"

This reverts commit 25bcb258df.

Revert "IMPALA-6588: don't add empty list of ranges in text scan"

This reverts commit d57fbec6f6.

Revert "IMPALA-4835: Part 3: switch I/O buffers to buffer pool"

This reverts commit 24b4ed0b29.

Revert "IMPALA-4835: Part 2: Allocate scan range buffers upfront"

This reverts commit 5699b59d0c.

Revert "IMPALA-4835: Part 1: simplify I/O mgr mem mgmt and cancellation"

This reverts commit 65680dc421.

Change-Id: Ie5ca451cd96602886b0a8ecaa846957df0269cbb
Reviewed-on: http://gerrit.cloudera.org:8080/9480
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Impala Public Jenkins
2018-03-03 04:22:12 +00:00
Tim Armstrong
24b4ed0b29 IMPALA-4835: Part 3: switch I/O buffers to buffer pool
This is the final patch to switch the Disk I/O manager to allocate all
buffer from the buffer pool and to reserve the buffers required for
a query upfront.

* The planner reserves enough memory to run a single scanner per
  scan node.
* The multi-threaded scan node must increase reservation before
  spinning up more threads.
* The scanner implementations must be careful to stay within their
  assigned reservation.

The row-oriented scanners were most straightforward, since they only
have a single scan range active at a time. A single I/O buffer is
sufficient to scan the whole file but more I/O buffers can improve I/O
throughput.

Parquet is more complex because it issues a scan range per column and
the sizes of the columns on disk are not known during planning. To
deal with this, the reservation in the frontend is based on a
heuristic involving the file size and # columns. The Parquet scanner
can then divvy up reservation to columns based on the size of column
data on disk.

I adjusted how the 'mem_limit' is divided between buffer pool and non
buffer pool memory for low mem_limits to account for the increase in
buffer pool memory.

Testing:
* Added more planner tests to cover reservation calcs for scan node.
* Test scanners for all file formats with the reservation denial debug
  action, to test behaviour when the scanners hit reservation limits.
* Updated memory and buffer pool limits for tests.
* Added unit tests for dividing reservation between columns in parquet,
  since the algorithm is non-trivial.

Perf:
I ran TPC-H and targeted perf locally comparing with master. Both
showed small improvements of a few percent and no regressions of
note. Cluster perf tests showed no significant change.

Change-Id: Ic09c6196b31e55b301df45cc56d0b72cfece6786
Reviewed-on: http://gerrit.cloudera.org:8080/8966
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins
2018-02-23 04:17:41 +00:00
Tim Armstrong
7007cd85fe IMPALA-5772: failure in test_scratch_disk.TestScratchDir didn't occur
The buffer pool changes reduced the memory requirement for the sort,
which seems to have been enough to get the query to execute without
spilling on S3. Reduce the limit in the test to force it to spill.

Testing:
Ran in a loop locally for an hour. Ran custom cluster tests on S3.

Change-Id: If65fee3e6a4b759d0d18e30a1c30bd48db0f2a54
Reviewed-on: http://gerrit.cloudera.org:8080/7615
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins
2017-08-09 00:04:25 +00:00
Tim Armstrong
a98b90bd38 IMPALA-4674: Part 2: port backend exec to BufferPool
Always create global BufferPool at startup using 80% of memory and
limit reservations to 80% of query memory (same as BufferedBlockMgr).
The query's initial reservation is computed in the planner, claimed
centrally (managed by the InitialReservations class) and distributed
to query operators from there.

min_spillable_buffer_size and default_spillable_buffer_size query
options control the buffer size that the planner selects for
spilling operators.

Port ExecNodes to use BufferPool:
  * Each ExecNode has to claim its reservation during Open()
  * Port Sorter to use BufferPool.
  * Switch from BufferedTupleStream to BufferedTupleStreamV2
  * Port HashTable to use BufferPool via a Suballocator.

This also makes PAGG memory consumption more efficient (avoid wasting buffers)
and improve the spilling algorithm:
* Allow preaggs to execute with 0 reservation - if streams and hash tables
  cannot be allocated, it will pass through rows.
* Halve the buffer requirement for spilling aggs - avoid allocating
  buffers for aggregated and unaggregated streams simultaneously.
* Rebuild spilled partitions instead of repartitioning (IMPALA-2708)

TODO in follow-up patches:
* Rename BufferedTupleStreamV2 to BufferedTupleStream
* Implement max_row_size query option.

Testing:
* Updated tests to reflect new memory requirements

Change-Id: I7fc7fe1c04e9dfb1a0c749fb56a5e0f2bf9c6c3e
Reviewed-on: http://gerrit.cloudera.org:8080/5801
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins
2017-08-05 01:03:02 +00:00
Lars Volker
8b7f876649 IMPALA-4722: Disable log caching in test_scratch_disk
test_scratch_disk fails sporadically when trying to assert the presence
of log messages. This is probably caused by log caching, since after
such failures the log files do contains the lines in question.

I manually tested this by running the tests repeatedly for 2 days (10k
runs).

To make future diagnosis of similar problems easier, this change also
adds more output to assert_impalad_log_contains().

Change-Id: I9f21284338ee7b4374aca249b6556282b0148389
Reviewed-on: http://gerrit.cloudera.org:8080/5669
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins
2017-01-12 18:58:48 +00:00
Tim Armstrong
95ed4434f2 IMPALA-3202,IMPALA-2079: rework scratch file I/O
Refactor BufferedBlockMgr/TmpFileMgr to push more I/O logic into
TmpFileMgr, in anticipation of it being shared with BufferPool.
TmpFileMgr now handles:
* Scratch space allocation and recycling
* Read and write I/O

The interface is also greatly changed so that it is built around Write()
and Read() calls, abstracting away the details of temporary file
allocation from clients. This means the TmpFileMgr::File class can
be hidden from clients.

Write error recovery:
Also implement write error recovery in TmpFileMgr.

If an error occurs while writing to scratch and we have multiple
scratch directories, we will try one of the other directories
before cancelling the query. File-level blacklisting is used to
prevent excessive repeated attempts to resize a scratch file during
a single query. Device-level blacklisting is not implemented because
it is problematic to permanently take a scratch directory out of use.

To reduce the number of error paths, all I/O errors are now handled
asynchronously. Previously errors creating or extending the file were
returned synchronously from WriteUnpinnedBlock(). This required
modifying DiskIoMgr to create the file if not present when opened.

Also set the default max_errors value in the thrift definition file,
so that it is in effect for backend tests.

Future Work:
* Support for recycling variable-length scratch file ranges. I omitted
  this to avoid making the patch even large.

Testing:
Updated BufferedBlockMgr unit test to reflect changes in behaviour:
* Scratch space is no longer permanently associated with a block, and
  is remapped every time a new block is written to disk .
* Files are now blacklisted - updated existing tests and enable the
  disable blacklisting test.

Added some basic testing of recycling of scratch file ranges in
the TmpFileMgr unit test.

I also manually tested the code in two ways. First by removing permissions
for /tmp/impala-scratch and ensuring that a spilling query fails cleanly.
Second, by creating a tiny ramdisk (16M) and running with two scratch
directories: one on /tmp and one on the tiny ramdisk. When spilling, an
out of space error is encountered for the tiny ramdisk and impala spills
the remaining data (72M) to /tmp.

Change-Id: I8c9c587df006d2f09d72dd636adafbd295fcdc17
Reviewed-on: http://gerrit.cloudera.org:8080/5141
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins
2017-01-05 02:26:24 +00:00
Dan Hecht
ffa7829b70 IMPALA-3918: Remove Cloudera copyrights and add ASF license header
For files that have a Cloudera copyright (and no other copyright
notice), make changes to follow the ASF source file header policy here:

http://www.apache.org/legal/src-headers.html#headers

Specifically:
1) Remove the Cloudera copyright.
2) Modify NOTICE.txt according to
   http://www.apache.org/legal/src-headers.html#notice
   to follow that format and add a line for Cloudera.
3) Replace or add the existing ASF license text with the one given
   on the website.

Much of this change was automatically generated via:

git grep -li 'Copyright.*Cloudera' > modified_files.txt
cat modified_files.txt | xargs perl -n -i -e 'print unless m#Copyright.*Cloudera#i;'
cat modified_files_txt | xargs fix_apache_license.py [1]

Some manual fixups were performed following those steps, especially when
license text was completely missing from the file.

[1] https://gist.github.com/anonymous/ff71292094362fc5c594 with minor
    modification to ORIG_LICENSE to match Impala's license text.

Change-Id: I2e0bd8420945b953e1b806041bea4d72a3943d86
Reviewed-on: http://gerrit.cloudera.org:8080/3779
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins
2016-08-09 08:19:41 +00:00
Taras Bobrovytsky
609b80410e Clean up Python test import statements
Many of our test scripts have import statements that look like
"from xxx import *". It is a good practice to explicitly name what
needs to be imported. This commit implements this practice. Also,
unused import statements are removed.

Change-Id: I6a33bb66552ae657d1725f765842f648faeb26a8
Reviewed-on: http://gerrit.cloudera.org:8080/3444
Reviewed-by: Michael Brown <mikeb@cloudera.com>
Tested-by: Internal Jenkins
2016-07-15 23:26:18 +00:00
Tim Armstrong
c1093ed861 IMPALA-3669: test_scratch_disk fails on S3
Make the test deterministic by using max_block_mgr_memory instead of
mem_limit, so that the non-deterministic scanner memory usage does not
influence the spilling behaviour of the queries.

Testing:
Ran the test locally to confirm that it succeeded. Also manually
computed the memory requirement. The data size to be sorted is ~220MB,
so with a 64MB block manager limit per node, at least one node must
spill.

Change-Id: I9525a029ac020bb5b8bea210a741c9f9c5ec3c75
Reviewed-on: http://gerrit.cloudera.org:8080/3318
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Reviewed-by: Michael Brown <mikeb@cloudera.com>
Reviewed-by: Silvius Rus <srus@cloudera.com>
Tested-by: Tim Armstrong <tarmstrong@cloudera.com>
2016-06-09 17:31:00 -07:00
Michael Brown
067af1957c IMPALA-3614: work around pytest bugs causing custom cluster test skips
All versions of pytest contain various bugs regarding test marking
(including skips) when tests are both:

1. class-level marked
2. inherited

More info is available in IMPALA-3614 and IMPALA-2943, but the gist is
that it's possible for some tests to be skipped when they shouldn't be.
This is happening pretty badly with the custom cluster tests, because
CustomClusterTestSuite has a class level skipif mark.

The easiest workaround for now is to remove the pytest skipif mark in
CustomClusterTestSuite and skip using explicit pytest.skip() in the
setup_class() method. Some CustomClusterTestSuite children implemented
their own setup_* methods, and I made some adjustments to them both to
clean them up and implement proper parent method calling via super().

Testing:

I ran the following combinations of all the custom cluster tests:

DEBUG   / HDFS  / core
RELEASE / HDFS  / exhaustive
DEBUG   / LOCAL / core
DEBUG   / S3    / core

Before, we'd get situations in which most of the tests were skipped.
Consider the RELEASE/HDFS/exhaustive situation:

  custom_cluster/test_admission_controller.py .....
  custom_cluster/test_alloc_fail.py ss
  custom_cluster/test_breakpad.py sssss
  custom_cluster/test_delegation.py sss
  custom_cluster/test_exchange_delays.py ss
  custom_cluster/test_hdfs_fd_caching.py s
  custom_cluster/test_hive_parquet_timestamp_conversion.py ss
  custom_cluster/test_insert_behaviour.py ss
  custom_cluster/test_legacy_joins_aggs.py s
  custom_cluster/test_parquet_max_page_header.py s
  custom_cluster/test_permanent_udfs.py sss
  custom_cluster/test_query_expiration.py sss
  custom_cluster/test_redaction.py ssss
  custom_cluster/test_s3a_access.py s
  custom_cluster/test_scratch_disk.py ssss
  custom_cluster/test_session_expiration.py s
  custom_cluster/test_spilling.py ssss
  authorization/test_authorization.py ss
  authorization/test_grant_revoke.py s

Now, more tests run appropriately:

  custom_cluster/test_admission_controller.py .....
  custom_cluster/test_alloc_fail.py ss
  custom_cluster/test_breakpad.py sssss
  custom_cluster/test_delegation.py ...
  custom_cluster/test_exchange_delays.py ss
  custom_cluster/test_hdfs_fd_caching.py .
  custom_cluster/test_hive_parquet_timestamp_conversion.py ..
  custom_cluster/test_insert_behaviour.py ..
  custom_cluster/test_kudu_not_available.py .
  custom_cluster/test_legacy_joins_aggs.py .
  custom_cluster/test_parquet_max_page_header.py .
  custom_cluster/test_permanent_udfs.py ...
  custom_cluster/test_query_expiration.py ...
  custom_cluster/test_redaction.py ....
  custom_cluster/test_s3a_access.py s
  custom_cluster/test_scratch_disk.py ....
  custom_cluster/test_session_expiration.py .
  custom_cluster/test_spilling.py ....
  authorization/test_authorization.py ..
  authorization/test_grant_revoke.py .

Change-Id: Ie301b69718f8690322cc3b4130fb1c715344779c
Reviewed-on: http://gerrit.cloudera.org:8080/3265
Reviewed-by: Michael Brown <mikeb@cloudera.com>
Tested-by: Michael Brown <mikeb@cloudera.com>
2016-06-06 17:34:07 -07:00
Taras Bobrovytsky
9dcf857ddb IMPALA-3368: Fix race in test_scratch_disk.py
The problem is that xdist is spinning up multiple processes and each
process tries to create the the same temp dirs. This commit fixes the
issue by creating a randomly named dir to avoid the conflict.

Change-Id: Ic0764843ace00aef8c9b01139906e01ab5213047
Reviewed-on: http://gerrit.cloudera.org:8080/2817
Reviewed-by: Casey Ching <casey@cloudera.com>
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
2016-05-12 14:17:35 -07:00
Tim Armstrong
1d2afcfec2 IMPALA-2079: Part 1: report non-writable scratch dirs at startup
Previously Impala could erroneously decide to use non-writable scratch
directories, e.g. if /tmp/impala-scratch already exists and is not
writable by the current user.

With this change, if we cannot remove and recreate a fresh scratch directory,
it is not used.  If we have no valid scratch directories, we log an
error and continue startup.

Add unit test for CreateDirectory to test behavior for success and
failure cases.

Add system tests to check logging and query execution in various
scenarios where we do not have scratch available.

Modify FilesystemUtil to use non-exception-throwing Boost functions to
avoid unhandled exceptions escaping into the rest of the Impala
codebase, which does not expect the use of exceptions.

Change-Id: Icaa8429051942424e1d811c54bde10102ac7f7b3
Reviewed-on: http://gerrit.cloudera.org:8080/565
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
2015-08-14 00:38:22 +00:00
Casey Ching
074e5b4349 Remove hashbang from non-script python files
Many python files had a hashbang and the executable bit set though
they were not intended to be run a standalone script. That makes
determining which python files are actually scripts very difficult.
A future patch will update the hashbang in real python scripts so they
use $IMPALA_HOME/bin/impala-python.

Change-Id: I04eafdc73201feefe65b85817a00474e182ec2ba
Reviewed-on: http://gerrit.cloudera.org:8080/599
Reviewed-by: Casey Ching <casey@cloudera.com>
Reviewed-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com>
Tested-by: Internal Jenkins
2015-08-04 05:26:07 +00:00
Taras Bobrovytsky
7faaa65996 Added order by query tests
- Added static order by tests to test_queries.py and QueryTest/sort.test
- test_order_by.py also contains tests with static queries that are run with
  multiple memory limits.
- Added stress, scratch disk and failpoints tests
- Incorporated Srinath's change that copied all order by with limit tests into
  the top-n.test file

Extra time required:

Serial:
scratch disk: 42 seconds
test queries sort : 77 seconds
test sort: 56 seconds
sort stress: 142 seconds
TOTAL: 5 min 17 seconds

Parallel(8 threads):
scratch disk: 40 seconds
test queries sort: 42 seconds
test sort: 49 seconds
sort stress: 93 seconds
TOTAL: 3 min 44 sec

Change-Id: Ic5716bcfabb5bb3053c6b9cebc9bfbbb9dc64a7c
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2820
Reviewed-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3205
2014-06-20 13:35:10 -07:00