impala

mirror of https://github.com/apache/impala.git synced 2025-12-19 18:12:08 -05:00

Author	SHA1	Message	Date
Riza Suminto	28cff4022d	IMPALA-14333: Run impala-py.test using Python3 Running exhaustive tests with env var IMPALA_USE_PYTHON3_TESTS=true reveals some tests that require adjustment. This patch made such adjustment, which mostly revolves around encoding differences and string vs bytes type in Python3. This patch also switch the default to run pytest with Python3 by setting IMPALA_USE_PYTHON3_TESTS=true. The following are the details: Change hash() function in conftest.py to crc32() to produce deterministic hash. Hash randomization is enabled by default since Python 3.3 (see https://docs.python.org/3/reference/datamodel.html#object.__hash__). This cause test sharding (like --shard_tests=1/2) produce inconsistent set of tests per shard. Always restart minicluster during custom cluster tests if --shard_tests argument is set, because test order may change and affect test correctness, depending on whether running on fresh minicluster or not. Moved one test case from delimited-latin-text.test to test_delimited_text.py for easier binary comparison. Add bytes_to_str() as a utility function to decode bytes in Python3. This is often needed when inspecting the return value of subprocess.check_output() as a string. Implement DataTypeMetaclass.__lt__ to substitute DataTypeMetaclass.__cmp__ that is ignored in Python3 (see https://peps.python.org/pep-0207/). Fix WEB_CERT_ERR difference in test_ipv6.py. Fix trivial integer parsing in test_restart_services.py. Fix various encoding issues in test_saml2_sso.py, test_shell_commandline.py, and test_shell_interactive.py. Change timeout in Impala.for_each_impalad() from sys.maxsize to 2^31-1. Switch to binary comparison in test_iceberg.py where needed. Specify text mode when calling tempfile.NamedTemporaryFile(). Simplify create_impala_shell_executable_dimension to skip testing dev and python2 impala-shell when IMPALA_USE_PYTHON3_TESTS=true. The reason is that several UTF-8 related tests in test_shell_commandline.py break in Python3 pytest + Python2 impala-shell combo. This skipping already happen automatically in build OS without system Python2 available like RHEL9 (IMPALA_SYSTEM_PYTHON2 env var is empty). Removed unused vector argument and fixed some trivial flake8 issues. Several test logic require modification due to intermittent issue in Python3 pytest. These include: Add _run_query_with_client() in test_ranger.py to allow reusing a single Impala client for running several queries. Ensure clients are closed when the test is done. Mark several tests in test_ranger.py with SkipIfFS.hive because they run queries through beeline + HiveServer2, but Ozone and S3 build environment does not start HiveServer2 by default. Increase the sleep period from 0.1 to 0.5 seconds per iteration in test_statestore.py and mark TestStatestore to execute serially. This is because TServer appears to shut down more slowly when run concurrently with other tests. Handle the deprecation of Thread.setDaemon() as well. Always force_restart=True each test method in TestLoggingCore, TestShellInteractiveReconnect, and TestQueryRetries to prevent them from reusing minicluster from previous test method. Some of these tests destruct minicluster (kill impalad) and will produce minidump if metrics verifier for next tests fail to detect healthy minicluster state. Testing: Pass exhaustive tests with IMPALA_USE_PYTHON3_TESTS=true. Change-Id: I401a93b6cc7bcd17f41d24e7a310e0c882a550d4 Reviewed-on: http://gerrit.cloudera.org:8080/23319 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-09-03 10:01:29 +00:00
Joe McDonnell	1d0b2ef0c5	IMPALA-14164: Fix timeout for fragments in flight in TestScratchDir On release builds, some tests in TestScratchDir have started hitting a timeout waiting for num-fragments-in-flight to reach 2. The code to wait for the metric sleeps one second between samples. If one of the query fragments starts and finishes during that second, the test will never see a sample containing two in-flight fragments. This happens on release builds because they are faster and more likely to complete within that second. This removes the code that waits for num-fragments-in-flight. All the tests have subsequent calls waiting for the scratch usage to reach a certain value. This will properly wait for the fragment to start up on its own. The num-fragments-in-flight wait doesn't add anything. Testing: - Ran custom_cluster/test_scratch_disk.py multiple times with a release build Change-Id: Ic8c573affc033056ba465c42bd420d5c1d3ba15c Reviewed-on: http://gerrit.cloudera.org:8080/23081 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-06-25 01:10:09 +00:00
Csaba Ringhofer	f98b697c7b	IMPALA-13929: Make 'functional-query' the default workload in tests This change adds get_workload() to ImpalaTestSuite and removes it from all test suites that already returned 'functional-query'. get_workload() is also removed from CustomClusterTestSuite which used to return 'tpch'. All other changes besides impala_test_suite.py and custom_cluster_test_suite.py are just mass removals of get_workload() functions. The behavior is only changed in custom cluster tests that didn't override get_workload(). By returning 'functional-query' instead of 'tpch', exploration_strategy() will no longer return 'core' in 'exhaustive' test runs. See IMPALA-3947 on why workload affected exploration_strategy. An example for affected test is TestCatalogHMSFailures which was skipped both in core and exhaustive runs before this change. get_workload() functions that return a different workload than 'functional-query' are not changed - it is possible that some of these also don't handle exploration_strategy() as expected, but individually checking these tests is out of scope in this patch. Change-Id: I9ec6c41ffb3a30e1ea2de773626d1485c69fe115 Reviewed-on: http://gerrit.cloudera.org:8080/22726 Reviewed-by: Riza Suminto <riza.suminto@cloudera.com> Reviewed-by: Daniel Becker <daniel.becker@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-04-08 07:12:55 +00:00
Riza Suminto	00dc79adf6	IMPALA-13907: Remove reference to create_beeswax_client This patch replace create_beeswax_client() reference to create_hs2_client() or vector-based client creation to prepare towards hs2 test migration. test_session_expiration_with_queued_query is changed to use impala.dbapi directly from Impyla due to limitation in ImpylaHS2Connection. TestAdmissionControllerRawHS2 is migrated to use hs2 as default test protocol. Modify test_query_expiration.py to set query option through client instead of SET query. test_query_expiration is slightly modified due to behavior difference in hs2 ImpylaHS2Connection. Remove remaining reference to BeeswaxConnection.QueryState. Fixed a bug in ImpylaHS2Connection.wait_for_finished_timeout(). Fix some easy flake8 issues caught thorugh this command: git show HEAD --name-only \| grep '^tests.*py' \ \| xargs -I {} impala-flake8 {} \ \| grep -e U100 -e E111 -e E301 -e E302 -e E303 -e F... Testing: - Pass exhaustive tests. Change-Id: I1d84251835d458cc87fb8fedfc20ee15aae18d51 Reviewed-on: http://gerrit.cloudera.org:8080/22700 Reviewed-by: Riza Suminto <riza.suminto@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-03-29 18:37:45 +00:00
Yida Wu	ddd4f4f8d6	IMPALA-13798: Cleanup host-level remote scratch dir on shutdown IMPALA-13677 introduces cleanup of remote scratch files on startup, and also introduces a host-level directory to do easier cleanup. An empty host-level directory in remote storage does not use resources, but it can stay there forever if a host goes offline permanently or hostname changes. To improve this behavior, this patch adds support for removing the remote host-level temporary directory on shutdown. This helps prevent too many empty directories left in the remote scratch path, especially for hosts that never restart. Changed the flag remote_scratch_cleanup_on_startup to remote_scratch_cleanup_on_start_stop, so that this flag also controls cleanup on shutdown. Tests: Passed exhaustive tests. Added an e2e testcase test_scratch_dirs_remote_dir_removal_on_shutdown. Change-Id: Ic8f446894afdf975630aef80a9d964a9a78d3b46 Reviewed-on: http://gerrit.cloudera.org:8080/22549 Reviewed-by: Daniel Becker <daniel.becker@cloudera.com> Reviewed-by: Abhishek Rawat <arawat@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-02-28 19:24:00 +00:00
Yida Wu	a159eb52f8	IMPALA-13677: Support remote scratch directory cleanup at Impala daemon startup This patch introduces a new feature for cleaning up remote scratch files during Impala daemon startup, ensuring that potential leftover files from abnormal shutdowns are removed. To allow efficient cleanup, this patch also refines the remote scratch directory hierarchy by adding a host-level directory, changing it from: <base_dir>/<backend_id>_<query_id>/<file_name> to: <base_dir>/<hostname>/<backend_id>_<query_id>/<file_name> <base_dir> is <scratch_dir_config_path>/impala-scratch. During startup, if the host-level directory exists, it will be removed entirely. This design assumes one Impala daemon per host, and it also assumes that multiple Impala clusters don't share the same scratch_dir path on remote filesystem. Even if they share the same prefix, each Impala cluster should have dedicated paths: --scratch_dirs=hdfs://remote_dir/scratch/impala1 --scratch_dirs=hdfs://remote_dir/scratch/impala2 Also added one flag remote_scratch_cleanup_on_startup to control whether the host-level directory is cleaned during Impala daemon startup. By default, this feature is enabled. If multiple daemons on a host or multiple clusters share the same remote scratch_dir path, we can set this to false to prevent unintended cleanup. Tests: Passed exhaustive tests. Adds testcase test_scratch_dirs_remote_spill_leftover_files_removal. Change-Id: Iadd49b7384d52bac5ddab4e86cd9f39dc2c88e1b Reviewed-on: http://gerrit.cloudera.org:8080/22378 Reviewed-by: Abhishek Rawat <arawat@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-01-29 05:40:10 +00:00
Riza Suminto	9c87cf41bf	IMPALA-13396: Unify tmp dir management in CustomClusterTestSuite There are many custom cluster tests that require creating temporary directory. The temporary directory typically live within a scope of test method and cleaned afterwards. However, some test do create temporary directory directly and forgot to clean them afterwards, leaving junk dirs under /tmp/ or $LOG_DIR. This patch unify the temporary directory management inside CustomClusterTestSuite. It introduce new 'tmp_dir_placeholders' arg in CustomClusterTestSuite.with_args() that list tmp dirs to create. 'impalad_args', 'catalogd_args', and 'impala_log_dir' now accept formatting pattern that is replaceable by a temporary dir path, defined through 'tmp_dir_placeholders'. There are few occurrences where mkdtemp is called and not replaceable by this work, such as tests/comparison/cluster.py. In that case, this patch change them to supply prefix arg so that developer knows that it comes from Impala test script. This patch also addressed several flake8 errors in modified files. Testing: - Pass custom cluster tests in exhaustive mode. - Manually run few modified tests and observe that the temporary dirs are created and removed under logs/custom_cluster_tests/ as the tests go. Change-Id: I8dd665e8028b3f03e5e33d572c5e188f85c3bdf5 Reviewed-on: http://gerrit.cloudera.org:8080/21836 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-10-02 01:25:39 +00:00
Yida Wu	99e8170997	IMPALA-12721: Fix flaky tests involving check_deleted_file_fd() check_deleted_file_fd() is introduced in IMPALA-12681, however some spilling testcases involving check_deleted_file_fd() seem flaky. This patch fixed the issue by adding a retry mechanism within the check_deleted_file_fd() function. If the function encounters a failure, it retries the process of verifying the presence of a deleted referencing file. Based on my local test, the file will be removed after the test even when the test fails and the call to delete the file handle is ahead of the call to remove the file (This has been confirmed through additional testing logs). While there is no theory why this would happen, introducing a retry mechanism has allowed the test case to run successfully for 200 times without encountering any failures. It is possible that a delay may be occurring at some point in the process which leads to this kind of failure. Tests: Reran the testcase 200 times without a failure. Change-Id: I900aab7dc9833015ce140253ff40da28a6ed3ba6 Reviewed-on: http://gerrit.cloudera.org:8080/21000 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-02-06 21:53:23 +00:00
Yida Wu	02d004a121	IMPALA-12698: Restrict check_deleted_file_fd() for fixing flaky tests The introduction of check_deleted_file_fd() in IMPALA-12681 aimed to detect a bug related to remote spilling where local temporary file handles were not being released after deletion. However, the tests associated with this function seem flaky in exhaustive builds with occasionally some files of hdfs may not be promptly released after deletion, though locally, I observed that these files are eventually removed from /proc/xx/fd in a few minutes, the reason is unclear yet. To fix the flaky build failure, this patch confines the scope of check_deleted_file_fd() to detect files containing the keyword "scratch" only. Given that hdfs files eventually get removed, and it seems not an urgent issue, a separate Jira will be filed to track and investigate this behavior further. Testing: Reran the tests a couple times and passed. Change-Id: I55f5aa1cdbc0c74f6c7ebd25575e71d2b238bf98 Reviewed-on: http://gerrit.cloudera.org:8080/20898 Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-01-15 16:46:00 +00:00
Yida Wu	56a7514ba6	IMPALA-12681: Release file descriptors for partially written temporary files This patch fixes a bug where partially written temporary files are removed without releasing the file descriptors. This patch fixes the bug by adding a call to Close() of the local file writer during the Delete() of the DiskFile class, which could be called when the local buffer file is being evicted or the query ends, ensuring proper release of the file handle. Testing: Passed core tests. Additionally, a check has been added in the test test_scratch_disk.py to verify that there are no deleted files in the /proc/x/fd/ directory. Change-Id: I58a2bac419ced806d6f5a32bcdf24d79e078ab14 Reviewed-on: http://gerrit.cloudera.org:8080/20852 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-01-08 23:55:06 +00:00
Joe McDonnell	eb66d00f9f	IMPALA-11974: Fix lazy list operators for Python 3 compatibility Python 3 changes list operators such as range, map, and filter to be lazy. Some code that expects the list operators to happen immediately will fail. e.g. Python 2: range(0,5) == [0,1,2,3,4] True Python 3: range(0,5) == [0,1,2,3,4] False The fix is to wrap locations with list(). i.e. Python 3: list(range(0,5)) == [0,1,2,3,4] True Since the base operators are now lazy, Python 3 also removes the old lazy versions (e.g. xrange, ifilter, izip, etc). This uses future's builtins package to convert the code to the Python 3 behavior (i.e. xrange -> future's builtins.range). Most of the changes were done via these futurize fixes: - libfuturize.fixes.fix_xrange_with_import - lib2to3.fixes.fix_map - lib2to3.fixes.fix_filter This eliminates the pylint warnings: - xrange-builtin - range-builtin-not-iterating - map-builtin-not-iterating - zip-builtin-not-iterating - filter-builtin-not-iterating - reduce-builtin - deprecated-itertools-function Testing: - Ran core job Change-Id: Ic7c082711f8eff451a1b5c085e97461c327edb5f Reviewed-on: http://gerrit.cloudera.org:8080/19589 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>	2023-03-09 17:17:57 +00:00
Joe McDonnell	82bd087fb1	IMPALA-11973: Add absolute_import, division to all eligible Python files This takes steps to make Python 2 behave like Python 3 as a way to flush out issues with running on Python 3. Specifically, it handles two main differences: 1. Python 3 requires absolute imports within packages. This can be emulated via "from __future__ import absolute_import" 2. Python 3 changed division to "true" division that doesn't round to an integer. This can be emulated via "from __future__ import division" This changes all Python files to add imports for absolute_import and division. For completeness, this also includes print_function in the import. I scrutinized each old-division location and converted some locations to use the integer division '//' operator if it needed an integer result (e.g. for indices, counts of records, etc). Some code was also using relative imports and needed to be adjusted to handle absolute_import. This fixes all Pylint warnings about no-absolute-import and old-division, and these warnings are now banned. Testing: - Ran core tests Change-Id: Idb0fcbd11f3e8791f5951c4944be44fb580e576b Reviewed-on: http://gerrit.cloudera.org:8080/19588 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>	2023-03-09 17:17:57 +00:00
Joe McDonnell	2b550634d2	IMPALA-11952 (part 2): Fix print function syntax Python 3 now treats print as a function and requires the parenthesis in invocation. print "Hello World!" is now: print("Hello World!") This fixes all locations to use the function invocation. This is more complicated when the output is being redirected to a file or when avoiding the usual newline. print >> sys.stderr , "Hello World!" is now: print("Hello World!", file=sys.stderr) To support this properly and guarantee equivalent behavior between python 2 and python 3, all files that use print now add this import: from __future__ import print_function This also fixes random flake8 issues that intersect with the changes. Testing: - check-python-syntax.sh shows no errors related to print Change-Id: Ib634958369ad777a41e72d80c8053b74384ac351 Reviewed-on: http://gerrit.cloudera.org:8080/19552 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Reviewed-by: Michael Smith <michael.smith@cloudera.com> Tested-by: Michael Smith <michael.smith@cloudera.com>	2023-02-28 17:11:50 +00:00
Michael Smith	a469a9cf19	IMPALA-11730: Add support for spilling to Ozone Adds support for spilling to Ozone (ofs and o3fs schemes) for parity with HDFS. Note that ofs paths start with <volume>/<bucket>, which have naming restrictions; tmp/impala-scratch is a valid name, so something like ofs://localhost:9862/tmp would work as a scratch directory (volume tmp, implicit bucket impala-scratch). Updates tests to determine the correct path from the environment. Fixes backend tests to work with Ozone as well. Guards test_scratch_disk.py behind a new flag for filesystems that support spilling. Updates metric verification to wait for scratch-space-bytes-used to be non-zero, as it seems to update slower with Ozone. Refactors TmpDir to remove extraneous variables and functions. Each implementation is expected to handle its own token parsing. Initializes default_fs in ExecEnv when using TestEnv. Previously it was uninitialized, and uses of default_fs would return an empty string. Testing: - Ran backend, end-to-end, and custom cluster tests with Ozone. - Ran test_scratch_disk.py exhaustive runs with Ozone and HDFS. Change-Id: I5837c30357363f727ca832fb94169f2474fb4f6f Reviewed-on: http://gerrit.cloudera.org:8080/19251 Reviewed-by: Michael Smith <michael.smith@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2022-12-08 17:20:33 +00:00
Michael Smith	f8443d9828	IMPALA-11697: Enable SkipIf.not_hdfs tests for Ozone Convert SkipIf.not_hdfs to SkipIf.not_dfs for tests that require filesystem semantics, adding more feature test coverage with Ozone. Creates a separate not_scratch_fs flag for scratch dir tests as they're not supported with Ozone yet. Filed IMPALA-11730 to address this. Preserves not_hdfs for a specific test that uses the dfsadmin CLI to put it in safemode. Adds sfs_ofs_unsupported for SmallFileSystem tests. This should work for many of our filesystems based on `ebb1e2fa99/ql/src/java/org/apache/hadoop/hive/ql/io/SingleFileSystem.java (L62-L87)`. Makes sfs tests work on S3. Adds hardcoded_uris for IcebergV2 tests where deletes are implemented as hardcoded URIs in parquet files. Adding a parquet read/write library for Python is beyond the scope if this patch. Change-Id: Iafc1dac52d013e74a459fdc4336c26891a256ef1 Reviewed-on: http://gerrit.cloudera.org:8080/19254 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>	2022-11-21 18:51:30 +00:00
Yida Wu	6dfab93fe9	IMPALA-10791 Add batch reading for remote temporary files The patch adds a feature to batch read from a remote temporary file in order to improve the reading performance for the spilled remote data. Originally, the design is to use the local disk file as the buffer for batch read from the remote file. But in practice, it doesn't help to improve the performance. Therefore, the design is changed to use the memory as the read buffer. Currently, each TmpFileRemote has two DiskFile, one is for the remote, and one is for the local buffer. The patch adds MemBlocks to the local buffer file. Each local buffer file is divided into several MemBlocks evenly. Moreover, in order to guarantee a single page not being cut into two parts in different blocks, the block size could be a little different to each other in practice. The default block size is the minimum value between the default file size and MAX_REMOTE_READ_MEM_BLOCK_THRESHOLD_BYTES, which is 16MB. When pinning a page, the system will detect if there is enough memory for the block that holds the page. If yes, the block will be stored in the memory until all of the pages in the block are read or the query ends. If not, we will go reading the page directly and disable this block, because it may be good to avoid duplicated reads from the remote fs for the same content. One challenge of the read buffer is where to get the extra memory for it, because when impala starts to spill data, it means the process lacks of memory to use. By default, impala process will reserve 20% of the total system memory as unused memory, and here we will use this unused memory for the read buffer because it is reasonable to use it for the emergency case like spilling and the memory of the read buffer will be returned immediately after the use. For system reliability consideration, we set a restriction that, the maximum bytes of the read buffer memory are no more than 10% of the total system memory and 50% of the unused memory. Also, if the unused memory is less than 5% of the total system memory, the read buffer will be disabled. Two start options have been added for the new feature. 1. remote_batch_read. Default is false. If set true, the batch read is enabled. 2. remote_read_memory_buffer_size. Default is 1G. The maximum memory that can be used by the read buffer. The number is also restricted by the process memory limit, which can not exceed 10% of the process memory limit. Added metrics ScratchReadsUseMem/ScratchBytesReadUseMem/ ScratchBytesReadUseLocalDisk to the query profile. The patch also increases the MAX_REMOTE_TMPFILE_SIZE_THRESHOLD_MB from 256 to 512. Tests: Ran core and exhaustive tests. Added and ran TmpFileMgrTest::TestBatchReadingFromRemote. Added e2e test test_scratch_dirs_batch_reading. Change-Id: I1dcc5d0881ffaeff09c5c514306cd668373ad31b Reviewed-on: http://gerrit.cloudera.org:8080/17979 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2022-09-28 09:02:08 +00:00
baggio000	1a3ff11d82	IMPALA-10429 Add support for specifying HDFS path in 'scratch_dirs' startup option We support the HDFS scratch space, but as a test-only feature with a fixed HDFS default local path. In this patch, we extend the HDFS scratch space to support the customer's input. For supporting the function, we add a new format for HDFS scratch space path. It forces the HDFS path to have the port number to solve the contradiction to the current format of the scratch space path. For example, previously, the format for scratch space path is, take s3 for example, s3a://bucketpath:#bytes:#priority. In this case, the bucketpath doesn't have a port number. In this patch, the new format of HDFS scratch path is hdfs://ipaddr:#port:#bytes:#priority. The port number is required, therefore, there must be at least one colon in the HDFS path, the bytes and priority are optional as before. For other scratch spaces, the path format doesn’t change. Also, option allow_spill_to_hdfs is removed because the spilling to HDFS is not a test-only function anymore, as a result, the e2e tests involved are updated. Tests: Added and passed TmpFileMgrTest::TestDirectoryLimitParsingRemotePath. Ran the Core tests. Change-Id: I0882ed1e80b02724dd5cb3cdb1fa7b6c2debcbf4 Reviewed-on: http://gerrit.cloudera.org:8080/17720 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2021-08-17 01:10:36 +00:00
Bikramjeet Vig	06c9016a37	IMPALA-8762: Track host level admission stats across all coordinators This patch adds the ability to share the per-host stats for locally admitted queries across all coordinators. This helps to get a more consolidated view of the cluster for stats like slots_in_use and mem_admitted when making local admission decisions. Testing: Added e2e py test Change-Id: I2946832e0a89b077d0f3bec755e4672be2088243 Reviewed-on: http://gerrit.cloudera.org:8080/17683 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2021-07-28 05:33:16 +00:00
Riza Suminto	47219ec366	IMPALA-10565: Adjust result spooling memory based on scratch_limit IMPALA-9856 enables result spooling by default. Result spooling depends on the ability to spill its entire BufferedTupleStream to disk once it hits maximum memory reservation. However, if the query option scratch_limit is set lower than max_spilled_result_spooling_mem, the query might fail in the middle of execution due to insufficient scratch space. This patch adds planner change to consider scratch_limit and scratch_dirs query option when computing resource used by result spooling. The algorithm is as follow: * If scratch_dirs is empty or scratch_limit < minMemReservationBytes required to use BufferedPlanRootSink, we set spool_query_results to false and fallback to use BlockingPlanRootSink. * If scratch_limit > minMemReservationBytes but still fairly low, we lower the max_result_spooling_mem (default is 100MB) and max_spilled_result_spooling_mem (default is 1GB) to fit scratch_limit. * if scratch_limit > max_spilled_result_spooling_mem, do nothing. Testing: - Add TestScratchLimit::test_result_spooling_and_varying_scratch_limit - Verify that spool_query_results query option is disabled in TestScratchDir::test_no_dirs - Pass exhaustive tests. Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9 Reviewed-on: http://gerrit.cloudera.org:8080/17166 Reviewed-by: Aman Sinha <amsinha@cloudera.com> Tested-by: Aman Sinha <amsinha@cloudera.com>	2021-03-14 03:35:40 +00:00
Riza Suminto	49ac55fb69	IMPALA-9856: Enable result spooling by default. Result spooling has been relatively stable since it was introduced, and it has several benefits described in IMPALA-8656. This patch enable result spooling (SPOOL_QUERY_RESULTS) query options by default. Furthermore, some tests need to be adjusted to account for result spooling by default. The following are the adjustment categories and list of tests that fall under such category. Change in assertions: PlannerTest#testAcidTableScans PlannerTest#testBloomFilterAssignment PlannerTest#testConstantFolding PlannerTest#testFkPkJoinDetection PlannerTest#testFkPkJoinDetectionWithHDFSNumRowsEstDisabled PlannerTest#testKuduSelectivity PlannerTest#testMaxRowSize PlannerTest#testMinMaxRuntimeFilters PlannerTest#testMinMaxRuntimeFiltersWithHDFSNumRowsEstDisabled PlannerTest#testMtDopValidation PlannerTest#testParquetFiltering PlannerTest#testParquetFilteringDisabled PlannerTest#testPartitionPruning PlannerTest#testPreaggBytesLimit PlannerTest#testResourceRequirements PlannerTest#testRuntimeFilterQueryOptions PlannerTest#testSortExprMaterialization PlannerTest#testSpillableBufferSizing PlannerTest#testTableSample PlannerTest#testTpch PlannerTest#testKuduTpch PlannerTest#testTpchNested PlannerTest#testUnion TpcdsPlannerTest custom_cluster/test_admission_controller.py::TestAdmissionController::test_dedicated_coordinator_planner_estimates custom_cluster/test_admission_controller.py::TestAdmissionController::test_memory_rejection custom_cluster/test_admission_controller.py::TestAdmissionController::test_pool_mem_limit_configs metadata/test_explain.py::TestExplain::test_explain_level2 metadata/test_explain.py::TestExplain::test_explain_level3 metadata/test_stats_extrapolation.py::TestStatsExtrapolation::test_stats_extrapolation Increase BUFFER_POOL_LIMIT: query_test/test_queries.py::TestQueries::test_analytic_fns query_test/test_runtime_filters.py::TestRuntimeRowFilters::test_row_filter_reservation query_test/test_sort.py::TestQueryFullSort::test_multiple_mem_limits_full_output query_test/test_spilling.py::TestSpillingBroadcastJoins::test_spilling_broadcast_joins query_test/test_spilling.py::TestSpillingDebugActionDimensions::test_spilling_aggs query_test/test_spilling.py::TestSpillingDebugActionDimensions::test_spilling_regression_exhaustive query_test/test_udfs.py::TestUdfExecution::test_mem_limits Increase MEM_LIMIT: query_test/test_mem_usage_scaling.py::TestExchangeMemUsage::test_exchange_mem_usage_scaling query_test/test_mem_usage_scaling.py::TestScanMemLimit::test_hdfs_scanner_thread_mem_scaling Increase MAX_ROW_SIZE: custom_cluster/test_parquet_max_page_header.py::TestParquetMaxPageHeader::test_large_page_header_config query_test/test_insert.py::TestInsertQueries::test_insert_large_string query_test/test_query_mem_limit.py::TestQueryMemLimit::test_mem_limit query_test/test_scanners.py::TestTextSplitDelimiters::test_text_split_across_buffers_delimiter query_test/test_scanners.py::TestWideRow::test_wide_row Disable result spooling to maintain assertion: custom_cluster/test_admission_controller.py::TestAdmissionController::test_set_request_pool custom_cluster/test_admission_controller.py::TestAdmissionController::test_timeout_reason_host_memory custom_cluster/test_admission_controller.py::TestAdmissionController::test_timeout_reason_pool_memory custom_cluster/test_admission_controller.py::TestAdmissionController::test_queue_reasons_memory custom_cluster/test_admission_controller.py::TestAdmissionController::test_pool_config_change_while_queued custom_cluster/test_query_retries.py::TestQueryRetries::test_retry_fetched_rows custom_cluster/test_query_retries.py::TestQueryRetries::test_retry_finished_query custom_cluster/test_scratch_disk.py::TestScratchDir::test_no_dirs custom_cluster/test_scratch_disk.py::TestScratchDir::test_non_existing_dirs custom_cluster/test_scratch_disk.py::TestScratchDir::test_non_writable_dirs query_test/test_insert.py::TestInsertQueries::test_insert_large_string (the last query only) query_test/test_kudu.py::TestKuduMemLimits::test_low_mem_limit_low_selectivity_scan query_test/test_mem_usage_scaling.py::TestScanMemLimit::test_kudu_scan_mem_usage query_test/test_queries.py::TestQueriesParquetTables::test_very_large_strings query_test/test_query_mem_limit.py::TestCodegenMemLimit::test_codegen_mem_limit shell/test_shell_client.py::TestShellClient::test_fetch_size Testing: - Pass exhaustive tests. Change-Id: I9e360c1428676d8f3fab5d95efee18aca085eba4 Reviewed-on: http://gerrit.cloudera.org:8080/16755 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2021-03-02 04:58:51 +00:00
Yida Wu	bead2ede1c	IMPALA-10533: Fix TestScratchDir.test_scratch_dirs_mix_local_and_remote_dir_spill_local_only seems flaky The E2E testcase emulates the situation when there are two types of scratch directories, the data only spills to the local one when the space of local directory is sufficient. The testcase works fine for the debug build, however in the release build, the system runs faster and more data is spilled from memory which exceeds the setting of the local scratch space limit. To solve this, the size limit of local scratch space is changed from 100M to 2GB, so that allows all of the spilled data is in the local instead of the remote directory. Tests: Reran test_scratch_dirs_mix_local_and_remote_dir_spill_local_only in the release build. Change-Id: If2dc32196b2554aee9fc94a4ccbbf5803dbcce1d Reviewed-on: http://gerrit.cloudera.org:8080/17102 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2021-02-25 07:50:36 +00:00
Yida Wu	85e39e170f	IMPALA-9867: Add Support for Spilling to S3: Milestone 1 Major Features 1) Local files as buffers for spilling to S3. 2) Async Upload for remote files. 3) Sync remote files deletion after query ends. 4) Local buffer files management. 5) Compatibility of spilling to local and remote. 6) All the errors from hdfs/s3 should terminate the query. Changes on TmpFile: * TmpFile is separated into two types of implementation, TmpFileLocal and TmpFileRemote. TmpFileLocal is used for Spilling to local file system. TmpFileRemote is a new type for Spilling to the remote. It contains two DiskFiles, one for local buffer, the other for the remote file. * The DiskFile is an object that contains the information of a pysical file for passing to the DiskIOMgr to execute the IO operations on that specific file. The DiskFile also contains status information of the file,includes DiskFileStatus::INWRITING/PERSISTED/DELETED. When the DiskFile is initialized, it is in INWRITING status. If the file is persisted into the file system, it would become PERSISTED status. If the file is deleted, for example, the local buffer is evicted, so the DiskFile status of the buffer file would become deleted. After that, if the file is fetching from the remote, the DiskFile status of the buffer file would become INWRITING, and then PERSISTED if the fetching finishes successfully. Implementation Details: 1) A new enum type is added to specify the disk type of files, indicating where the file physically locates. The types include DiskFileType::LOCAL/LOCAL_BUFFER/DFS/S3. DiskFileType::LOCAL indicates the file is in the local file system. DiskFileType::LOCAL_BUFFER indicates the file is in the local file system, and it is the buffer of a remote scratch file. DiskFileType::DFS/S3 indicates the file is in the HDFS/S3. The local buffer allows the buffer pool to pin(read), but mainly for remote files, buffer pool would pin(read) the page from the remote file system. 2) Two disk queues have been added to do the file operation jobs. Queue name: RemoteS3DiskFileOper/RemoteDfsDiskFileOper File operations on the remote disk like upload and fetch should be done in these queues. The purpose of the queues is to isolate the file operations from normal read/write IO operations in different queues. It could increase the efficiency of the file operations by not being interrupted during a relatively long execution time, and also provide a more accurate control on the thread number working on file operation jobs. RemoteOperRange is the new type to carry the file operation jobs. Previously,we have request types of READ and WRITE. Now FILE_FETCH/FILE_UPLOAD are added. 3) The tmp files are physically deleted when the tmp file group is deconstructing. For remote files, the entire directory would be deleted. 4) The local buffer files management is to control the total size of local buffer files and evict files if needed. A local buffer file can be evicted if the temporary file has uploaded a copy to the remote disk or the query ends. There are two modes to decide the sequence of choosing files to be evicted first. Default is LIFO, the other is FIFO. It can be controlled by startup option remote_tmp_files_avail_pool_lifo. Also, a thread TmpFileSpaceReserveThreadLoop in TmpFileMgr is running to allow to reserve buffer file space in an async way to avoid deadlocks. Startup option allow_spill_to_hdfs is added. By default the HDFS path is not allowed, but for testcases, the option can be set true to allow the use of HDFS path as scratch space for testing only. Startup option wait_for_spill_buffer_timeout_s is added to control the maximum duration waiting for the buffer in the TmpFileBufferPool. Default value is 60, stands for 60 seconds. 5) Spilling to local has higher priority than spilling to remote. If no local scratch space is available, temporary data will be spilled to remote. The first available local directory is used for the local buffer for spilling to remote if any remote directory is configured. If remote directory is configured without any available local scratch space, an error will be returned during initialization. The purpose of the design is to simplify the implementation in milestone 1 with less changes on the configuration. Example (setting remote scratch space): Assume that the directories we have for scratch space: * Local dir: /tmp/local_buffer, /tmp/local, /tmp/local_sec * Remote dir: s3a://tmp/remote The scratch space path is configured in the startup options, and could have three types of configurations: 1. Pure local scratch space --scratch_dirs="/tmp/local" 2. Pure remote scratch space --scratch_dirs="s3a://tmp/remote,/tmp/local_buffer:16GB" 3. Mixed local and remote scratch space --scratch_dirs="s3a://tmp/romote:200GB,/tmp/local_buffer:1GB, /tmp/local:2GB, /tmp/local_sec:16GB" * Type 1: a pure local scratch space with unlimited size. * Type 2: a pure remote scratch space with a 16GB local buffer. * Type 3: a mixed local and remote scratch space, the size of the local buffer for the remote directory is 1GB, while local scratch spaces are 2GB and 16GB, remote scratch space bytes limit is 200GB. Remote scratch space is used only when all of the local spaces are at capacity. * Note: The first local directory would be the local buffer path, if a remote scratch space is registered. Limitations: * Only one remote scratch dir is supported. * The first local scratch dir is used for the buffer of remote scratch space if remote scratch dir exists. Testcases: * Ran pre-review-test * Unit Tests added to tmp-file-mgr-test/disk-io-mgr-test/buffer-pool-test. * E2E Tests added to custom_cluster/test_scratch_disk.py. * Ran Unit Tests: $IMPALA_HOME/be/build/debug/runtime/buffered-tuple-stream-test $IMPALA_HOME/be/build/debug/runtime/tmp-file-mgr-test $IMPALA_HOME/be/build/debug/runtime/bufferpool/buffer-pool-test $IMPALA_HOME/be/build/debug/runtime/io/disk-io-mgr-test * Ran E2E Tests: custom_cluster/test_scratch_disk.py Change-Id: I419b1d5dbbfe35334d9f964c4b65e553579fdc89 Reviewed-on: http://gerrit.cloudera.org:8080/16318 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2021-02-19 23:54:44 +00:00
Abhishek Rawat	3b9ae415e2	IMPALA-9697: Support priority based scratch directory selection The '--scratch_dirs' configuration option now supports specifying the priority of the scratch direcotry. The lower the numeric value, the higher is the priority. If priority is not specified then default priority with value numeric_limits<int>::max() is used. Valid formats for specifying the priority are: - <dir-path>:<limit>:<priority> - <dir-path>::<priority> Following formats use default priority: - <dir-path> - <dir-path>:<limit> - <dir-path>:<limit>: The new logic in TmpFileGroup::AllocateSpace() tries to find a target file using a prioritized round-robin scheme. Files are ordered in decreasing order of their priority. The priority of a file is same as the priority of the related directory. A target file is selected by always searching in the ordered list starting from the file with highest priority. If multiple files have same priority, then the target file is selected in a round robin manner. Testing: - Added unit and e2e tests for priority based spilling logic. Change-Id: I381c3a358e1382e6696325fec74667f1fa18dd17 Reviewed-on: http://gerrit.cloudera.org:8080/16091 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-06-26 10:20:41 +00:00
Tim Armstrong	ebbe52b4be	IMPALA-3766: optionally compress spilled data Enabled via --disk_spill_compression_codec, which uses the same syntax as the compression_codec query option. Recommended codecs are LZ4 and ZSTD. ZSTD supports specifying a compression level. The compression is done in TmpFileMgr using a temporary buffer. Allocation of disk space is reworked slightly so that the allocation can happen after compression. The default power-of-two disk block sizes would lead to a lot of internal fragmentation, so a new strategy for free space management, similar to that used in the data cache, can be used with --disk_spill_punch_holes=true. TmpFileMgr will allocate a range of the actual compressed size and punch holes in the file for each range that is no longer needed. UncompressedWriteIoBytes is added to the buffer pool profiles, so that you can see what degree of compression is achieved. Typically I saw ratios of 2-3x for LZ4 and ZSTD (with LZ4 toward the lower end and ZSTD toward the higher end). Limitations: The management of the compression buffer memory could be improved. Ideally it would be integrated with the buffer pool and use the buffer pool allocator instead of being done "on the side". We would probably want to do this before making this the default, for resource management and performance reasons (doing a malloc() directly does not use the caching supported by the buffer pool). Testing: * Run buffer pool spilling tests with different combinations of the new options. * Extend existing TmpFileMgr tests for file space allocation to run with hole punching enabled. * Switch a couple of spilling tests to use the new option. * Add a metrics test to check for scratch leaks. * Enable the new options by default for end-to-end dockerized tests to get additional coverage. * Add a unit test where allocating compression memory fails, both on the read and write path. * Ran a single-node stress test on TPC-DS SF 1 and TPC-H SF 10 The peak compression buffer usage was ~40MB. Perf: I ran this spilling query using an SSD as the scratch disk: set mem_limit=200m; select count(distinct l_partkey) from tpch30_parquet.lineitem; The time taken for the second run of each query was: No compression: 19.59s LZ4: 18.56s ZSTD: 20.59s Change-Id: I9c08ff9504097f0fee8c32316c5c150136abe659 Reviewed-on: http://gerrit.cloudera.org:8080/15454 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Bikramjeet Vig <bikramjeet.vig@cloudera.com>	2020-03-31 01:36:44 +00:00
Tim Armstrong	4fb8e8e324	IMPALA-8816: reduce custom cluster test runtime in core This includes some optimisations and a bulk move of tests to exhaustive. Move a bunch of custom cluster tests to exhaustive. I selected these partially based on runtime (i.e. I looked most carefully at the tests that ran for over a minute) and the likelihood of them catching a precommit bug. Regression tests for specific edge cases and tests for parts of the code that are very stable were prime candidates. Remove an unnecessary cluster restart in test_breakpad. Merge test_scheduler_error into test_failpoints to avoid an unnecessary cluster restart. Speed up cluster starts by ensuring that the default statestore args are applied even when _start_impala_cluster() is called directly. This shaves a couple of seconds off each restart. We made the default args use a faster update frequency - see IMPALA-7185 - but they did not take effect in all tests. Change-Id: Ib2e3e7ebc9695baec4d69183387259958df10f62 Reviewed-on: http://gerrit.cloudera.org:8080/13967 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-08-06 21:34:26 +00:00
Tim Armstrong	f9ced753ba	IMPALA-7999: clean up start-d.sh scripts Delete these wrapper scripts and replace with a generic start-daemon.sh script that sets environment variables without the other logic. Move the logic for setting JAVA_TOOL_OPTIONS into start-impala-cluster.py. Remove some options like -jvm_suspend, -gdb, -perf that may not be used. These can be reintroduced if needed. Port across the kerberized minicluster logic (which has probably bitrotted) in case it needs to be revived. Remove --verbose option that didn't appear to be useful (it claims to print daemon output to the console, but output is still redirected regardless). Removed a level of quoting in custom cluster test argument handling - this was made unnecessary by properly escaping arguments with pipes.escape() in run_daemon(). Testing: Ran exhaustive tests. * Ran on CentOS 6 to confirm we didn't reintroduce Popen issue worked around by kwho. Change-Id: Ib67444fd4def8da119db5d3a0832ef1de15b068b Reviewed-on: http://gerrit.cloudera.org:8080/12271 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-02-05 13:10:08 +00:00
Tim Armstrong	3e4015c601	IMPALA-7179: allow_multiple_scratch_dirs_per_device=true by default The previous default was often confusing to users of Impala. It is simpler to do exactly what is asked instead of trying to fix bad configurations automatically. Testing: Ran core tests. Change-Id: I23394c9949ae4cd0a21d7bb25551371b3198e76c Reviewed-on: http://gerrit.cloudera.org:8080/10736 Reviewed-by: anujphadke <aphadke@cloudera.com> Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-06-16 07:51:42 +00:00
Tim Armstrong	fb5dc9eb48	IMPALA-4835: switch I/O buffers to buffer pool This is the following squashed patches that were reverted. I will fix the known issues with some follow-on patches. ====================================================================== IMPALA-4835: Part 1: simplify I/O mgr mem mgmt and cancellation In preparation for switching the I/O mgr to the buffer pool, this removes and cleans up a lot of code so that the switchover patch starts from a cleaner slate. * Remove the free buffer cache (which will be replaced by buffer pool's own caching). * Make memory limit exceeded error checking synchronous (in anticipation of having to propagate buffer pool errors synchronously). * Simplify error propagation - remove the (ineffectual) code that enqueued BufferDescriptors containing error statuses. * Document locking scheme better in a few places, make it part of the function signature when it seemed reasonable. * Move ReturnBuffer() to ScanRange, because it is intrinsically connected with the lifecycle of a scan range. * Separate external ReturnBuffer() and internal CleanUpBuffer() interfaces - previously callers of ReturnBuffer() were fudging the num_buffers_in_reader accounting to make the external interface work. * Eliminate redundant state in ScanRange: 'eosr_returned_' and 'is_cancelled_'. * Clarify the logic around calling Close() for the last BufferDescriptor. -> There appeared to be an implicit assumption that buffers would be freed in the order they were returned from the scan range, so that the "eos" buffer was returned last. Instead just count the number of outstanding buffers to detect the last one. -> Touching the is_cancelled_ field without holding a lock was hard to reason about - violated locking rules and it was unclear that it was race-free. * Remove DiskIoMgr::Read() to simplify the interface. It is trivial to inline at the callsites. This will probably regress performance somewhat because of the cache removal, so my plan is to merge it around the same time as switching the I/O mgr to allocate from the buffer pool. I'm keeping the patches separate to make reviewing easier. Testing: * Ran exhaustive tests * Ran the disk-io-mgr-stress-test overnight ====================================================================== IMPALA-4835: Part 2: Allocate scan range buffers upfront This change is a step towards reserving memory for buffers from the buffer pool and constraining per-scanner memory requirements. This change restructures the DiskIoMgr code so that each ScanRange operates with a fixed set of buffers that are allocated upfront and recycled as the I/O mgr works through the ScanRange. One major change is that ScanRanges get blocked when a buffer is not available and get unblocked when a client returns a buffer via ReturnBuffer(). I was able to remove the logic to maintain the blocked_ranges_ list by instead adding a separate set with all ranges that are active. There is also some miscellaneous cleanup included - e.g. reducing the amount of code devoted to maintaining counters and metrics. One tricky part of the existing code was the it called IssueInitialRanges() with empty lists of files and depended on DiskIoMgr::AddScanRanges() to not check for cancellation in that case. See IMPALA-6564/IMPALA-6588. I changed the logic to not try to issue ranges for empty lists of files. I plan to merge this along with the actual buffer pool switch, but separated it out to allow review of the DiskIoMgr changes separate from other aspects of the buffer pool switchover. Testing: * Ran core and exhaustive tests. ====================================================================== IMPALA-4835: Part 3: switch I/O buffers to buffer pool This is the final patch to switch the Disk I/O manager to allocate all buffer from the buffer pool and to reserve the buffers required for a query upfront. * The planner reserves enough memory to run a single scanner per scan node. * The multi-threaded scan node must increase reservation before spinning up more threads. * The scanner implementations must be careful to stay within their assigned reservation. The row-oriented scanners were most straightforward, since they only have a single scan range active at a time. A single I/O buffer is sufficient to scan the whole file but more I/O buffers can improve I/O throughput. Parquet is more complex because it issues a scan range per column and the sizes of the columns on disk are not known during planning. To deal with this, the reservation in the frontend is based on a heuristic involving the file size and # columns. The Parquet scanner can then divvy up reservation to columns based on the size of column data on disk. I adjusted how the 'mem_limit' is divided between buffer pool and non buffer pool memory for low mem_limits to account for the increase in buffer pool memory. Testing: * Added more planner tests to cover reservation calcs for scan node. * Test scanners for all file formats with the reservation denial debug action, to test behaviour when the scanners hit reservation limits. * Updated memory and buffer pool limits for tests. * Added unit tests for dividing reservation between columns in parquet, since the algorithm is non-trivial. Perf: I ran TPC-H and targeted perf locally comparing with master. Both showed small improvements of a few percent and no regressions of note. Cluster perf tests showed no significant change. Change-Id: I3ef471dc0746f0ab93b572c34024fc7343161f00 Reviewed-on: http://gerrit.cloudera.org:8080/9679 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Tim Armstrong <tarmstrong@cloudera.com>	2018-04-28 23:41:39 +00:00
Tim Armstrong	161cbe30ff	Revert IMPALA-4835 and dependent changes Revert "IMPALA-6585: increase test_low_mem_limit_q21 limit" This reverts commit `25bcb258df`. Revert "IMPALA-6588: don't add empty list of ranges in text scan" This reverts commit `d57fbec6f6`. Revert "IMPALA-4835: Part 3: switch I/O buffers to buffer pool" This reverts commit `24b4ed0b29`. Revert "IMPALA-4835: Part 2: Allocate scan range buffers upfront" This reverts commit `5699b59d0c`. Revert "IMPALA-4835: Part 1: simplify I/O mgr mem mgmt and cancellation" This reverts commit `65680dc421`. Change-Id: Ie5ca451cd96602886b0a8ecaa846957df0269cbb Reviewed-on: http://gerrit.cloudera.org:8080/9480 Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Impala Public Jenkins	2018-03-03 04:22:12 +00:00
Tim Armstrong	24b4ed0b29	IMPALA-4835: Part 3: switch I/O buffers to buffer pool This is the final patch to switch the Disk I/O manager to allocate all buffer from the buffer pool and to reserve the buffers required for a query upfront. * The planner reserves enough memory to run a single scanner per scan node. * The multi-threaded scan node must increase reservation before spinning up more threads. * The scanner implementations must be careful to stay within their assigned reservation. The row-oriented scanners were most straightforward, since they only have a single scan range active at a time. A single I/O buffer is sufficient to scan the whole file but more I/O buffers can improve I/O throughput. Parquet is more complex because it issues a scan range per column and the sizes of the columns on disk are not known during planning. To deal with this, the reservation in the frontend is based on a heuristic involving the file size and # columns. The Parquet scanner can then divvy up reservation to columns based on the size of column data on disk. I adjusted how the 'mem_limit' is divided between buffer pool and non buffer pool memory for low mem_limits to account for the increase in buffer pool memory. Testing: * Added more planner tests to cover reservation calcs for scan node. * Test scanners for all file formats with the reservation denial debug action, to test behaviour when the scanners hit reservation limits. * Updated memory and buffer pool limits for tests. * Added unit tests for dividing reservation between columns in parquet, since the algorithm is non-trivial. Perf: I ran TPC-H and targeted perf locally comparing with master. Both showed small improvements of a few percent and no regressions of note. Cluster perf tests showed no significant change. Change-Id: Ic09c6196b31e55b301df45cc56d0b72cfece6786 Reviewed-on: http://gerrit.cloudera.org:8080/8966 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins	2018-02-23 04:17:41 +00:00
Tim Armstrong	7007cd85fe	IMPALA-5772: failure in test_scratch_disk.TestScratchDir didn't occur The buffer pool changes reduced the memory requirement for the sort, which seems to have been enough to get the query to execute without spilling on S3. Reduce the limit in the test to force it to spill. Testing: Ran in a loop locally for an hour. Ran custom cluster tests on S3. Change-Id: If65fee3e6a4b759d0d18e30a1c30bd48db0f2a54 Reviewed-on: http://gerrit.cloudera.org:8080/7615 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins	2017-08-09 00:04:25 +00:00
Tim Armstrong	a98b90bd38	IMPALA-4674: Part 2: port backend exec to BufferPool Always create global BufferPool at startup using 80% of memory and limit reservations to 80% of query memory (same as BufferedBlockMgr). The query's initial reservation is computed in the planner, claimed centrally (managed by the InitialReservations class) and distributed to query operators from there. min_spillable_buffer_size and default_spillable_buffer_size query options control the buffer size that the planner selects for spilling operators. Port ExecNodes to use BufferPool: * Each ExecNode has to claim its reservation during Open() * Port Sorter to use BufferPool. * Switch from BufferedTupleStream to BufferedTupleStreamV2 * Port HashTable to use BufferPool via a Suballocator. This also makes PAGG memory consumption more efficient (avoid wasting buffers) and improve the spilling algorithm: * Allow preaggs to execute with 0 reservation - if streams and hash tables cannot be allocated, it will pass through rows. * Halve the buffer requirement for spilling aggs - avoid allocating buffers for aggregated and unaggregated streams simultaneously. * Rebuild spilled partitions instead of repartitioning (IMPALA-2708) TODO in follow-up patches: * Rename BufferedTupleStreamV2 to BufferedTupleStream * Implement max_row_size query option. Testing: * Updated tests to reflect new memory requirements Change-Id: I7fc7fe1c04e9dfb1a0c749fb56a5e0f2bf9c6c3e Reviewed-on: http://gerrit.cloudera.org:8080/5801 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins	2017-08-05 01:03:02 +00:00
Lars Volker	8b7f876649	IMPALA-4722: Disable log caching in test_scratch_disk test_scratch_disk fails sporadically when trying to assert the presence of log messages. This is probably caused by log caching, since after such failures the log files do contains the lines in question. I manually tested this by running the tests repeatedly for 2 days (10k runs). To make future diagnosis of similar problems easier, this change also adds more output to assert_impalad_log_contains(). Change-Id: I9f21284338ee7b4374aca249b6556282b0148389 Reviewed-on: http://gerrit.cloudera.org:8080/5669 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins	2017-01-12 18:58:48 +00:00
Tim Armstrong	95ed4434f2	IMPALA-3202,IMPALA-2079: rework scratch file I/O Refactor BufferedBlockMgr/TmpFileMgr to push more I/O logic into TmpFileMgr, in anticipation of it being shared with BufferPool. TmpFileMgr now handles: * Scratch space allocation and recycling * Read and write I/O The interface is also greatly changed so that it is built around Write() and Read() calls, abstracting away the details of temporary file allocation from clients. This means the TmpFileMgr::File class can be hidden from clients. Write error recovery: Also implement write error recovery in TmpFileMgr. If an error occurs while writing to scratch and we have multiple scratch directories, we will try one of the other directories before cancelling the query. File-level blacklisting is used to prevent excessive repeated attempts to resize a scratch file during a single query. Device-level blacklisting is not implemented because it is problematic to permanently take a scratch directory out of use. To reduce the number of error paths, all I/O errors are now handled asynchronously. Previously errors creating or extending the file were returned synchronously from WriteUnpinnedBlock(). This required modifying DiskIoMgr to create the file if not present when opened. Also set the default max_errors value in the thrift definition file, so that it is in effect for backend tests. Future Work: * Support for recycling variable-length scratch file ranges. I omitted this to avoid making the patch even large. Testing: Updated BufferedBlockMgr unit test to reflect changes in behaviour: * Scratch space is no longer permanently associated with a block, and is remapped every time a new block is written to disk . * Files are now blacklisted - updated existing tests and enable the disable blacklisting test. Added some basic testing of recycling of scratch file ranges in the TmpFileMgr unit test. I also manually tested the code in two ways. First by removing permissions for /tmp/impala-scratch and ensuring that a spilling query fails cleanly. Second, by creating a tiny ramdisk (16M) and running with two scratch directories: one on /tmp and one on the tiny ramdisk. When spilling, an out of space error is encountered for the tiny ramdisk and impala spills the remaining data (72M) to /tmp. Change-Id: I8c9c587df006d2f09d72dd636adafbd295fcdc17 Reviewed-on: http://gerrit.cloudera.org:8080/5141 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins	2017-01-05 02:26:24 +00:00
Dan Hecht	ffa7829b70	IMPALA-3918: Remove Cloudera copyrights and add ASF license header For files that have a Cloudera copyright (and no other copyright notice), make changes to follow the ASF source file header policy here: http://www.apache.org/legal/src-headers.html#headers Specifically: 1) Remove the Cloudera copyright. 2) Modify NOTICE.txt according to http://www.apache.org/legal/src-headers.html#notice to follow that format and add a line for Cloudera. 3) Replace or add the existing ASF license text with the one given on the website. Much of this change was automatically generated via: git grep -li 'Copyright.Cloudera' > modified_files.txt cat modified_files.txt \| xargs perl -n -i -e 'print unless m#Copyright.Cloudera#i;' cat modified_files_txt \| xargs fix_apache_license.py [1] Some manual fixups were performed following those steps, especially when license text was completely missing from the file. [1] https://gist.github.com/anonymous/ff71292094362fc5c594 with minor modification to ORIG_LICENSE to match Impala's license text. Change-Id: I2e0bd8420945b953e1b806041bea4d72a3943d86 Reviewed-on: http://gerrit.cloudera.org:8080/3779 Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Internal Jenkins	2016-08-09 08:19:41 +00:00
Taras Bobrovytsky	609b80410e	Clean up Python test import statements Many of our test scripts have import statements that look like "from xxx import *". It is a good practice to explicitly name what needs to be imported. This commit implements this practice. Also, unused import statements are removed. Change-Id: I6a33bb66552ae657d1725f765842f648faeb26a8 Reviewed-on: http://gerrit.cloudera.org:8080/3444 Reviewed-by: Michael Brown <mikeb@cloudera.com> Tested-by: Internal Jenkins	2016-07-15 23:26:18 +00:00
Tim Armstrong	c1093ed861	IMPALA-3669: test_scratch_disk fails on S3 Make the test deterministic by using max_block_mgr_memory instead of mem_limit, so that the non-deterministic scanner memory usage does not influence the spilling behaviour of the queries. Testing: Ran the test locally to confirm that it succeeded. Also manually computed the memory requirement. The data size to be sorted is ~220MB, so with a 64MB block manager limit per node, at least one node must spill. Change-Id: I9525a029ac020bb5b8bea210a741c9f9c5ec3c75 Reviewed-on: http://gerrit.cloudera.org:8080/3318 Reviewed-by: Dan Hecht <dhecht@cloudera.com> Reviewed-by: Michael Brown <mikeb@cloudera.com> Reviewed-by: Silvius Rus <srus@cloudera.com> Tested-by: Tim Armstrong <tarmstrong@cloudera.com>	2016-06-09 17:31:00 -07:00
Michael Brown	067af1957c	IMPALA-3614: work around pytest bugs causing custom cluster test skips All versions of pytest contain various bugs regarding test marking (including skips) when tests are both: 1. class-level marked 2. inherited More info is available in IMPALA-3614 and IMPALA-2943, but the gist is that it's possible for some tests to be skipped when they shouldn't be. This is happening pretty badly with the custom cluster tests, because CustomClusterTestSuite has a class level skipif mark. The easiest workaround for now is to remove the pytest skipif mark in CustomClusterTestSuite and skip using explicit pytest.skip() in the setup_class() method. Some CustomClusterTestSuite children implemented their own setup_* methods, and I made some adjustments to them both to clean them up and implement proper parent method calling via super(). Testing: I ran the following combinations of all the custom cluster tests: DEBUG / HDFS / core RELEASE / HDFS / exhaustive DEBUG / LOCAL / core DEBUG / S3 / core Before, we'd get situations in which most of the tests were skipped. Consider the RELEASE/HDFS/exhaustive situation: custom_cluster/test_admission_controller.py ..... custom_cluster/test_alloc_fail.py ss custom_cluster/test_breakpad.py sssss custom_cluster/test_delegation.py sss custom_cluster/test_exchange_delays.py ss custom_cluster/test_hdfs_fd_caching.py s custom_cluster/test_hive_parquet_timestamp_conversion.py ss custom_cluster/test_insert_behaviour.py ss custom_cluster/test_legacy_joins_aggs.py s custom_cluster/test_parquet_max_page_header.py s custom_cluster/test_permanent_udfs.py sss custom_cluster/test_query_expiration.py sss custom_cluster/test_redaction.py ssss custom_cluster/test_s3a_access.py s custom_cluster/test_scratch_disk.py ssss custom_cluster/test_session_expiration.py s custom_cluster/test_spilling.py ssss authorization/test_authorization.py ss authorization/test_grant_revoke.py s Now, more tests run appropriately: custom_cluster/test_admission_controller.py ..... custom_cluster/test_alloc_fail.py ss custom_cluster/test_breakpad.py sssss custom_cluster/test_delegation.py ... custom_cluster/test_exchange_delays.py ss custom_cluster/test_hdfs_fd_caching.py . custom_cluster/test_hive_parquet_timestamp_conversion.py .. custom_cluster/test_insert_behaviour.py .. custom_cluster/test_kudu_not_available.py . custom_cluster/test_legacy_joins_aggs.py . custom_cluster/test_parquet_max_page_header.py . custom_cluster/test_permanent_udfs.py ... custom_cluster/test_query_expiration.py ... custom_cluster/test_redaction.py .... custom_cluster/test_s3a_access.py s custom_cluster/test_scratch_disk.py .... custom_cluster/test_session_expiration.py . custom_cluster/test_spilling.py .... authorization/test_authorization.py .. authorization/test_grant_revoke.py . Change-Id: Ie301b69718f8690322cc3b4130fb1c715344779c Reviewed-on: http://gerrit.cloudera.org:8080/3265 Reviewed-by: Michael Brown <mikeb@cloudera.com> Tested-by: Michael Brown <mikeb@cloudera.com>	2016-06-06 17:34:07 -07:00
Taras Bobrovytsky	9dcf857ddb	IMPALA-3368: Fix race in test_scratch_disk.py The problem is that xdist is spinning up multiple processes and each process tries to create the the same temp dirs. This commit fixes the issue by creating a randomly named dir to avoid the conflict. Change-Id: Ic0764843ace00aef8c9b01139906e01ab5213047 Reviewed-on: http://gerrit.cloudera.org:8080/2817 Reviewed-by: Casey Ching <casey@cloudera.com> Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2016-05-12 14:17:35 -07:00
Tim Armstrong	1d2afcfec2	IMPALA-2079: Part 1: report non-writable scratch dirs at startup Previously Impala could erroneously decide to use non-writable scratch directories, e.g. if /tmp/impala-scratch already exists and is not writable by the current user. With this change, if we cannot remove and recreate a fresh scratch directory, it is not used. If we have no valid scratch directories, we log an error and continue startup. Add unit test for CreateDirectory to test behavior for success and failure cases. Add system tests to check logging and query execution in various scenarios where we do not have scratch available. Modify FilesystemUtil to use non-exception-throwing Boost functions to avoid unhandled exceptions escaping into the rest of the Impala codebase, which does not expect the use of exceptions. Change-Id: Icaa8429051942424e1d811c54bde10102ac7f7b3 Reviewed-on: http://gerrit.cloudera.org:8080/565 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Internal Jenkins	2015-08-14 00:38:22 +00:00
Casey Ching	074e5b4349	Remove hashbang from non-script python files Many python files had a hashbang and the executable bit set though they were not intended to be run a standalone script. That makes determining which python files are actually scripts very difficult. A future patch will update the hashbang in real python scripts so they use $IMPALA_HOME/bin/impala-python. Change-Id: I04eafdc73201feefe65b85817a00474e182ec2ba Reviewed-on: http://gerrit.cloudera.org:8080/599 Reviewed-by: Casey Ching <casey@cloudera.com> Reviewed-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com> Tested-by: Internal Jenkins	2015-08-04 05:26:07 +00:00
Taras Bobrovytsky	7faaa65996	Added order by query tests - Added static order by tests to test_queries.py and QueryTest/sort.test - test_order_by.py also contains tests with static queries that are run with multiple memory limits. - Added stress, scratch disk and failpoints tests - Incorporated Srinath's change that copied all order by with limit tests into the top-n.test file Extra time required: Serial: scratch disk: 42 seconds test queries sort : 77 seconds test sort: 56 seconds sort stress: 142 seconds TOTAL: 5 min 17 seconds Parallel(8 threads): scratch disk: 40 seconds test queries sort: 42 seconds test sort: 49 seconds sort stress: 93 seconds TOTAL: 3 min 44 sec Change-Id: Ic5716bcfabb5bb3053c6b9cebc9bfbbb9dc64a7c Reviewed-on: http://gerrit.ent.cloudera.com:8080/2820 Reviewed-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/3205	2014-06-20 13:35:10 -07:00

42 Commits