impala

mirror of https://github.com/apache/impala.git synced 2025-12-25 02:03:09 -05:00

Author	SHA1	Message	Date
Joe McDonnell	60f8f87b09	IMPALA-10274: Initialize impala-python as part of the CMake build Initializing the impala-python virtualenv takes a couple minutes, so it is useful to do that in parallel to the rest of the build. This moves the impala-python initialization to its own step in the CMake build. It stops using impala-python for commands invoked from buildall.sh or the CMake build to avoid premature or concurrent initializations of impala-python. Then, it adds a dedicated step to initialize impala-python. Testing: - Ran a core job and a couple builds - Rebuilt and verified that impala-python is not reinitialized if it is already initialized Change-Id: Ieff51263c55bd234028fed7101c94b4a928590f0 Reviewed-on: http://gerrit.cloudera.org:8080/16607 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2021-02-04 17:03:57 +00:00
Alexey Serbin	f8ed3f6722	IMPALA-10472 flag for Kudu connection negotiation timeout This patch adds --kudu_client_connection_negotiation_timeout_ms flag to control client-side connection negotiation timeout in the Kudu client working as a part of the Impala's BE. Since [1] has been addressed for Kudu C++ client, it makes sense to provide a control knob to customize the timeout. That should help to address cases where very busy cluster nodes hosting Kudu tablet servers aren't fast enough to negotiate a new connection within the default timeout interval (3 sec), as mentioned in the description of [1]. [1] https://issues.apache.org/jira/browse/KUDU-2966 Change-Id: I1223187318691da47082608356547f6d78144466 Reviewed-on: http://gerrit.cloudera.org:8080/16705 Reviewed-by: Thomas Tauber-Marshall <tmarshall@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2021-02-04 05:42:26 +00:00
wzhou-code	b5e2a0ce2e	IMPALA-9224: Blacklist nodes with faulty disk for spilling This patch extends blacklist functionality by adding executor node to blacklist if a query fails caused by disk failure during spill-to-disk. Also classifies disk error codes and defines a blacklistable error set for non-transient disk errors. Coordinator blacklists executor only if the executor hitted blacklistable error during spill-to-disk. Adds a new debug action to simulate disk write error during spill-to- disk. To use, specify in query options as: 'debug_action': 'IMPALA_TMP_FILE_WRITE:<hostname>:<port>:<action>' where <hostname> and <port> represent the impalad which execute the fragment instances, <port> is the BE krpc port (default 27000). Adds new test cases for blacklist and query-retry to cover the code changes. Testing: - Passed new test cases. - Passed exhaustive test. - Manually simulated disk failures in scratch directories on nodes of a cluster, verified that the nodes were blacklisted as expected. Change-Id: I04bfcb7f2e0b1ef24a5b4350f270feecd8c47437 Reviewed-on: http://gerrit.cloudera.org:8080/16949 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2021-02-04 05:12:42 +00:00
Alexey Serbin	91fd8fd130	[config] bump toolchain build id The motivation for this version patch is two-fold: * Update the version of Kudu client to reflect the recently released Kudu 1.14 (see https://kudu.apache.org/releases/1.14.0/) * Pick up https://gerrit.cloudera.org/#/c/16705 change to control Kudu client connection negotiation timeout in impalad Change-Id: I20fd6b092ce6a04465624914f6116a33622e977e Reviewed-on: http://gerrit.cloudera.org:8080/17018 Reviewed-by: Thomas Tauber-Marshall <tmarshall@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2021-02-03 23:01:27 +00:00
Gabor Kaszab	1a38924d20	IMPALA-9588: Add extra logging to cancel tests There have been some cancel tests that are flaky and their logs didn't reveal the root cause of the failures. Adding some extra logging so that we can see a bit more of the nature of the failure. The extra log message contains: - Query SQL - Message of the exception thrown during fetching the results - Query Status line from the query profile Change-Id: Ied7100a9ea2e2f0611cf8e328e589b4c8e5d5100 Reviewed-on: http://gerrit.cloudera.org:8080/16985 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2021-02-03 21:01:24 +00:00
Zoltan Borok-Nagy	a81c6a7829	IMPALA-10460: Impala should write normalized paths in Iceberg manifests Currently Impala writes double slashes in the paths of datafiles for non-partitioned Iceberg tables. Unnormalized paths can cause problems later. This patch removes the redundant slashes. Testing: * Tested manually by inspecting the manifest files of the Iceberg tables. Used both non-partitioned and partitioned tables. Change-Id: If5ecac78102ed35710dd70a18edc71f6e891e748 Reviewed-on: http://gerrit.cloudera.org:8080/16993 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2021-02-01 16:55:53 +00:00
Zoltan Borok-Nagy	646b0e011c	IMPALA-10456: Implement TRUNCATE for Iceberg tables This patch adds support for the TRUNCATE statement for Iceberg tables. The TRUNCATE operation creates a new snapshot for the target table that doesn't have any data files. Table and column stats are also cleared. This patch also fixes a bug that caused table/column stats not being propagated. Testing * added e2e tests for both partitioned and unpartitioned tables Change-Id: I6116c7c36aba871c0be79f499e0ac618072ca7b8 Reviewed-on: http://gerrit.cloudera.org:8080/16987 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: wangsheng <skyyws@163.com>	2021-02-01 11:14:01 +00:00
Joe McDonnell	f086d5c6f9	IMPALA-10462: Include org/apache/hive/hadoop/common/type/* in impala-minimal-hive-exec With newer versions of Iceberg, TestIcebergTable::test_create_iceberg_tables fails with ClassNotFoundException for org.apache.hive.hadoop.common.type.Date. This adds that missing location to the impala-minimal-hive-exec. Testing: - Ran TestIcebergTable::test_create_iceberg_tables with newer Iceberg Change-Id: I3fc33ff17489c2bd54d2ec8798ec7a3e5cfb051c Reviewed-on: http://gerrit.cloudera.org:8080/17005 Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com> Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>	2021-01-29 00:18:33 +00:00
Thomas Tauber-Marshall	39c424d7c8	IMPALA-10454: Bump --ssl_minimum_version to tls1.2 TLS versions < 1.2 are now considered insecure. This patch improves Impala's default security. This is made possible now in part because Impala 4.0 dropped support for Python versions < 2.7.9 (or 2.7.5 on certain distributions where it has been patched) as lower Python versions do not support tls1.2 Testing: - Existing SSL tests are updated to reflect the new default. Change-Id: Ifed66646b041a061f9db92744710aef7453f39e4 Reviewed-on: http://gerrit.cloudera.org:8080/16988 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2021-01-28 04:19:39 +00:00
Tim Armstrong	4c828b65ab	IMPALA-8306: clarify wording on /sessions UI Change-Id: I01578feb0f2bccd2605bbe6aa2e9eca382260f2e Reviewed-on: http://gerrit.cloudera.org:8080/16981 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Vincent Tran <vttran@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2021-01-27 23:33:14 +00:00
xqhe	4ae847bf94	IMPALA-10382: fix invalid outer join simplification When set ENABLE_OUTER_JOIN_TO_INNER_TRANSFORMATION = true, the planner will simplify outer joins if the predicate with case expr or conditional function on both sides of outer join. However, the predicate maybe not null-rejecting, if simplify the outer join, the result is incorrect. E.g. t1.b > coalesce(t1.c, t2.c) can return true if t2.c is null, so it is not null-rejecting predicate for t2. The fix is simply to support the case that the predicate with two operands and the operator is one of (=, !=, >, <, >=, <=), 1. one of the operands or 2. if the operand is arithmetic expression and one of the children does not contain conditional builtin function or case expr and has tuple id in outer joined tuples. E.g. t1.b > coalesce(t2.c, t1.c) or t1.b + coalesce(t2.c, t1.c) > coalesce(t2.c, t1.c) is null-rejecting predicate for t1. Testing: * Add new plan tests in outer-to-inner-joins.test * Add new query tests to verify the correctness on transformation Change-Id: I84a3812f4212fa823f3d1ced6e12f2df05aedb2b Reviewed-on: http://gerrit.cloudera.org:8080/16845 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>	2021-01-27 17:30:37 +00:00
Zoltan Borok-Nagy	08367e91f0	IMPALA-10452: CREATE Iceberg tables with old PARTITIONED BY syntax For convenience this patch adds support with the old-style CREATE TABLE ... PARTITIONED BY ...; syntax for Iceberg tables. So users should be able to write the following: CREATE TABLE ice_t (i int) PARTITIONED BY (p int) STORED AS ICEBERG; Which should be equivalent to this: CREATE TABLE ice_t (i int, p int) PARTITION BY SPEC (p IDENTITY) STORED AS ICEBERG; Please note that the old-style CREATE TABLE statement creates IDENTITY-partitioned tables. For other partition transforms the users must use the new, more generic syntax. Hive also supports the old PARTITIONED BY syntax with the same behavior. Testing: * added e2e tests Change-Id: I789876c161bc0987820955aa9ae01414e0dcb45d Reviewed-on: http://gerrit.cloudera.org:8080/16979 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2021-01-26 22:12:25 +00:00
Tim Armstrong	f4584dd276	IMPALA-10404: Update docs to reflect RLE_DICTIONARY support Fix references to PLAIN_DICTIONARY to reflect that RLE_DICTIONARY is supported too. Change-Id: Iee98abfd760396cf43302c9077c6165eb3623335 Reviewed-on: http://gerrit.cloudera.org:8080/16982 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2021-01-26 16:47:27 +00:00
Tim Armstrong	eb85c6eeca	IMPALA-9793: Impala quickstart cluster with docker-compose What works: * A single node cluster can be started up with docker-compose * HMS data is stored in Derby database in a docker volume * Filesystem data is stored in a shared docker volume, using the localfs support in the Hadoop client. * A Kudu cluster with a single master can be optionally added on to the Impala cluster. * TPC-DS data can be loaded automatically by a data loading container. We need to set up a docker network called quickstart-network, purely because docker-compose insists on generating network names with underscores, which are part of the FQDN and end up causing problems with Java's URL parsing, which rejects these technically invalid domain names. How to run: Instructions for running the quickstart cluster are in docker/README.md. How to build containers: ./buildall.sh -release -noclean -notests -ninja ninja quickstart_hms_image quickstart_client_image docker_images How to upload containers to dockerhub: IMPALA_QUICKSTART_IMAGE_PREFIX=timgarmstrong/ for i in impalad_coord_exec impalad_coordinator statestored \ impalad_executor catalogd impala_quickstart_client \ impala_quickstart_hms do docker tag $i ${IMPALA_QUICKSTART_IMAGE_PREFIX}$i docker push ${IMPALA_QUICKSTART_IMAGE_PREFIX}$i done I pushed containers build from commit f260cce22, which was branched from `6cb7cecacf` on master. Misc other stuff: * Added more metadata to all images. TODO: * Test and instructions to run against Kudu quickstart * Upload latest version of containers before merging. Change-Id: Ifc0b862af40a368381ada7ec2a355fe4b0aa778c Reviewed-on: http://gerrit.cloudera.org:8080/15966 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2021-01-26 11:22:08 +00:00
Andrew Sherman	3b763b5c32	IMPALA-10447: Add a newline when exporting shell output to a file. Impala shell outputs a batch of rows using OutputStream. Inside OutputStream, output to a file is handled slightly differently from output that is written to stdout. When writing to stdout we use print() (which appends a newline) while when writing to a file we use write() (which adds nothing). This difference was introduced in IMPALA-3343 so this bug may be a regression introduced then. To ensure that output is the same in either case we need to add a newline after writing each batch of rows to a file. TESTING: Added a new test for this case. Change-Id: I078a06c54e0834bc1f898626afbfff4ded579fa9 Reviewed-on: http://gerrit.cloudera.org:8080/16966 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2021-01-26 08:32:29 +00:00
stiga-huang	e8720b40f1	IMPALA-2019(Part-1): Provide UTF-8 support in length, substring and reverse functions A unicode character can be encoded into 1-4 bytes in UTF-8. String functions will return undesired results when the input contains unicode characters, because we deal with a string as a byte array. For instance, length() returns the length in bytes, not in unicode characters. UTF-8 is the dominant unicode encoding used in the Hadoop ecosystem. This patch adds UTF-8 support in some string functions so they can have UTF-8 aware behavior. For compatibility with the old versions, a new query option, UTF8_MODE, is added for turning on/off the UTF-8 aware behavior. Currently, only length(), substring() and reverse() support it. Other function supports will be added in later patches. String functions will check the query option and switch to use the desired implementation. It's similar to how we use the decimal_v2 query option in builtin functions. For easy testing, the UTF-8 aware version of string functions are also exposed as builtin functions (named by utf8_*, e.g. utf8_length). Tests: - Add BE tests for utf8 functions. - Add e2e tests for the UTF8_MODE query option. Change-Id: I0aaf3544e89f8a3d531ad6afe056b3658b525b7c Reviewed-on: http://gerrit.cloudera.org:8080/16908 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2021-01-26 00:43:39 +00:00
Riza Suminto	2644203d1c	IMPALA-10147: Avoid getting a file handle for data cache hits When reading from the data cache, the disk IO thread first gets a file handle, then it checks the data cache for a hit. The file handle is only used if there is a data cache miss. It is not used when data cache hit and in turns becomes an overhead. This patch move the file handle retrieval later when data cache miss hapens. Testing: - Add custom cluster test test_no_fd_caching_on_cached_data. - Pass core tests. Change-Id: Icc68f233518f862454e87bcbbef14d65fcdb7c91 Reviewed-on: http://gerrit.cloudera.org:8080/16963 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2021-01-25 16:05:00 +00:00
Laszlo Gaal	6d4756da01	IMPALA-10448: Build impala-profile-tool early for Docker-based tests impala-profile-tool is a new dependency for end-to-end tests. The tool is built together with all the other backend tests (so the buildall.sh flag '-notests' can turn off building it), it is actually used in the parallel phase of end-to-end tests. This means a problem for Docker-based builds for the following reasons: - Docker-based tests run BE, FE and various phases of the EE test in separate Docker containers for parallel executions - Test binaries are only built inside the container running BE tests to cut down on the build time and the size of the Docker image that all test containers are based on. - This means that the EE_TEST_PARALLEL container will miss the tool required for running test designed to test it. The solution is to build the tool early, at the end of the build phase running in the build container. There is already another such tool built there (parquet-reader) for similar reason, so just add impala-profile-tool to the same 'make' command there. Tested by running BE_TEST and EE_TEST_PARALLEL phases in a Docker-based build. Change-Id: I60e78ea883f3057c59a345feca38ef08a7f6a0b8 Reviewed-on: http://gerrit.cloudera.org:8080/16965 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2021-01-22 20:39:59 +00:00
Tim Armstrong	1ada739e81	IMPALA-10296: Fix analytic limit pushdown when predicates are present This fixes the analytic push down optimization for the case where the ORDER BY expressions are compatible with the partitioning of the analytic and there is a rank() or row_number() predicate. In this case the rows returned are going to come from the first partitions, i.e. if the limit is 100, if we go through the partitions in order until the row count adds up to 100, then we know that the rows must come from those partitions. The problem is that predicates can discard rows from the partitions, meaning that a limit naively pushed down to the top-n will filter out rows that could be returned from the query. We can avoid the problem in the case where the partition limit >= order by limit, however. In this case the relevant set of partitions is the set of partitions that include the first <limit> rows, since the top-level limit generally kicks in before the per-partition limit. The only twist is that the orderings may be different within a partition, so we need to make sure to include all of the rows in the final partition. The solution implemented in this patch is to increase the pushed down limit so that it is always guaranteed to include all of the rows in the final partition to be returned. E.g. if you had a row_number() <= 100 predicate and limit 100, if you pushed down limit 200, then you'd be guaranteed to capture all of the rows in the final partition. One case we need to handle is that, in the case of a rank() predicate, we can have more than that number of rows in the partition because of ties. This patch implements tie handling in the backend (I took most of that implementation from my in-progress partitioned top-n patch, with the intention of rebasing that onto this patch). This also adds a check against TOPN_BYTES_LIMIT so that the limit can't be increased to an arbitarily large value. Testing: * Add new planner test with negative case where it's rejected because the transformation is incorrect. * Update other planner tests to reflect new limit calculation + tie handling required for correctness. * Add planner test for very high rank predicate that overflows int32 * Add planner test that checks TOPN_BYTES_LIMIT handling * Add planner test that checks that dense_rank() can't be pushed. * Existing planner tests already have adequate coverage for predicates : <=, <, = and row_number(). * Add some end-to-end tests that repro bugs that fall under the jira * Add an end-to-end test on TPC-H with more data to exercise the tie-handling logic in the execnode more. Perf: Ran TPC-DS q67 with mt_dop=1 on a single node, confirmed there was no measurable change in performance as a result of this patchset. Ran TPC-H scale 30 on a single node, no significant perf change. Ran a targeted query to check for regressions in the top-n node. The elapsed time for this targeted query did not change: use tpch30_parquet; set mt_dop=1; select l_extendedprice from lineitem order by 1 limit 100 Change-Id: I801d7799b0d649c73d2dd1703729a9b58a662509 Reviewed-on: http://gerrit.cloudera.org:8080/16942 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2021-01-22 05:31:37 +00:00
liuyao	18acca92ee	IMPALA-10435: Extend 'compute incremental stats' syntax to support a list of columns Modified parser to support compute incremental stats columns.No need to modify the code of other modules because it already supports Change-Id: I4dcc2d4458679c39581446f6d87bb7903803f09b Reviewed-on: http://gerrit.cloudera.org:8080/16947 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>	2021-01-21 19:35:26 +00:00
Fucun Chu	5852a0028b	IMPALA-10440: Import Theta functionality from DataSketches This patch imports the functionality needed for Theta approximate algorithm from Apache DataSketches. First, I updated our existing snapshot of DataSketches to the following commit:b2f749ed5ce6ba650f4259602b133c310c3a5ee4" Merge pull request #182 from chufucun/include_type" This affects files originated from hll/, kll/ and theta/ directories of the DataSketches repo. Then I copied all the files needed for Theta into our snapshot directory. Browse the source files here: `b2f749ed5c` Change-Id: I8485d6829f50b130c84ec8bef0a4b5895255ba6c Reviewed-on: http://gerrit.cloudera.org:8080/16959 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2021-01-21 14:48:10 +00:00
Fucun Chu	ac7f605711	IMPALA-10421: [DOCS] Documented the JOIN_ROWS_PRODUCED_LIMIT query option - Minor edit Change-Id: I3d422889c433062456748a953b33e3d43799be14 Reviewed-on: http://gerrit.cloudera.org:8080/16922 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Aman Sinha <amsinha@cloudera.com>	2021-01-21 07:16:42 +00:00
Laszlo Gaal	cdf5108aa9	IMPALA-10441: Skip test_bytes_read_per_column if not on local minicluster IMPALA-9865 part 2 made some expected outcomes of the above test steps stricter. Unfortunately these stricter results are only valid when the tests are run on an HDFS file system in the context of a local minicluster, breaking the same test on S3 and EC storage. The patch disables the test step when run outside the context of a local minicluster HDFS. Change-Id: If8a179937c9c7c690dd2630549464dbe6aa1b834 Reviewed-on: http://gerrit.cloudera.org:8080/16964 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2021-01-21 02:10:14 +00:00
stiga-huang	4c6cf4b2ef	IMPALA-10434: Fix impala-shell's unicode regressions on Python2 To make impala-shell compatible for Python3, we explicitly distinguish bytes and text in Python2 by decoding the bytes for all inputs. Regression 1: multiple queries in one line with unicode chars will break In precmd() of impala-shell, if there are multiple queries present in one input line, we split it into individual queries (by sqlparse.split()) and append them back to the 'cmdqueue'. They will be passed to precmd() again. In our Python2 implementation, precmd() expects them to be str type, and will decode them into unicode type. However, the output type of sqlparse.split() is unicode which doesn't have a decode() method. Calling decode() on a unicode var will let Python2 implicitly encode it to str. This may cause UnicodeEncodeError since implicitly encoding use 'ascii'. Regression 2: multi-line query with unicode chars will break when command history is enabled In _check_for_command_completion(), when calling readline.replace_history_item in Python2. We encode the completed_cmd into bytes. However, we shouldn't replace it since the return type is expected to be unicode. Tests: - Add tests for these two regressions in Python2. Change-Id: Icc4a8d31311a5c59e5fc0e65fe09f770df41bea4 Reviewed-on: http://gerrit.cloudera.org:8080/16960 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2021-01-20 10:20:02 +00:00
wzhou-code	8ecb61e4bd	IMPALA-10259: Fixed DCHECK error for backend in terminal state This issue happened for core ASAN build. According to log message, one backend sent status report with instance_exec_status as done for all assigned instances without error, then it sent last status report with error. The coordinator treat the backend state as done after it processed the status report with instance_exec_status as done, but did not apply last status report with error to the overall backend state. This caused backend to receive a response with status as OK for the last status report, hence hit DCHECK error. This patch fix the race for updating the 'Query State' and updating the fragment instance state when hitting error during execution of fragment instance. The backends will not send status report with fragment instance state as "completed" without error after hitting error. Testing: - Manual tests I could only reproduce the situation by adding some artificial delays in the beginning of QueryState::ErrorDuringExecute() when repeatedly running test case test_spilling.py:: TestSpillingDebugActionDimensions::test_spilling_naaj for Impala ASAN build. Verified that the issue did not happen after applying this patch. - Passed exhaustive test. Change-Id: Ic12a80e20ddc11e32349edfec2bd16338c24b841 Reviewed-on: http://gerrit.cloudera.org:8080/16900 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2021-01-20 02:19:35 +00:00
Zoltan Borok-Nagy	90f3b2f491	IMPALA-10432: INSERT INTO Iceberg tables with partition transforms INSERT INTO Iceberg tables that use partition transforms. Partition transforms are functions that calculate partition data from row data. There are the following partition transforms in Iceberg: https://iceberg.apache.org/spec/#partition-transforms * IDENTITY * BUCKET * TRUNCATE * YEAR * MONTH * DAY * HOUR INSERT INTO identity-partitioned Iceberg tables are already supported. This patch adds support for the rest of the transforms. We create the partitioning expressions in InsertStmt. Based on these expressions data are automatically shuffled and sorted by the backend executors before rows are given to the table sink operators. The table sink operator writes the partitions one-by-one and creates a human-readable partition path for them. In the end, we will convert the partition path to partition data and create Iceberg DataFiles with information about the files written. Testing: * added planner test * added e2e tests Change-Id: I3edf02048cea78703837b248c55219c22d512b78 Reviewed-on: http://gerrit.cloudera.org:8080/16939 Reviewed-by: wangsheng <skyyws@163.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2021-01-18 18:46:42 +00:00
stiga-huang	9bb7157bf0	IMPALA-10387: Add missing overloads of mask functions used in Ranger default masking policies The mask functions in Hive are implemented through GenericUDFs which can accept an infinite number of function signatures. Impala currently don't support GenericUDFs. So we provide builtin mask functions with limited overloads. This patch adds some missing overloads that could be used by Ranger default masking policies, e.g. MASK_HASH, MASK_SHOW_LAST_4, MASK_DATE_SHOW_YEAR, etc. Tests: - Add test coverage on all default masking policies applied on all supported types. Change-Id: Icf3e70fd7aa9f3b6d6b508b776696e61ec1fcc2e Reviewed-on: http://gerrit.cloudera.org:8080/16930 Reviewed-by: Quanlong Huang <huangquanlong@gmail.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2021-01-15 13:01:53 +00:00
Tim Armstrong	5fddbb569a	IMPALA-9865: part 2/2: add verbosity to profile tool Adds a --profile_verbosity option for impala-profile-tool with the following levels: * 0: minimal * 1: legacy - matches old output, this is the default still * 2: default - basic descriptive stats, used for V2 profile. * 3: extended * 4: full This will help with transition to the V2 profile because we can have a nice, high-level, readable text profile by default with the option to produce more detailed profiles and alternate views of the profile from the thrift profile. Use the profile version in impala-profile-tool to dump the more verbose output for the V2 profile while preserving the same output for the legacy profile. Reduce verbosity of v2 profile output - only include mean/min/max by default. I intend to refine the output at the different verbosity levels for the v2 profiles further as part of IMPALA-9382, it is still fairly noisy. Fix output with/without gen_experimental_profile - there was a small difference in that the summary stats were not output in the averaged profile. Testing: * Add an end-to-end test that generates output for a small profile log and compares against expected files. * Tweak other profile tests to reflect changes to output. Change-Id: I82618a813e29af7996dfaed78873b2a73bc0231d Reviewed-on: http://gerrit.cloudera.org:8080/16881 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2021-01-15 00:50:39 +00:00
Zoltan Borok-Nagy	696dafed66	IMPALA-10426: Fix crash when inserting invalid timestamps Insertion of invalid timestamps causes Impala to crash when it uses the INT64 Parquet timestamp types. This patch fixes the error by checking for null values in Int64TimestampColumnWriterBase::ConvertValue(). Testing: * added e2e tests Change-Id: I74fb754580663c99e1d8c3b73f8d62ea3305ac93 Reviewed-on: http://gerrit.cloudera.org:8080/16951 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2021-01-14 19:34:38 +00:00
Thomas Tauber-Marshall	91adb33b22	IMPALA-9975 (part 2): Introduce new admission control daemon A recent patch (IMPALA-9930) introduces a new admission control rpc service, which can be configured to perform admission control for coordinators. In that patch, the admission service runs in an impalad. This patch separates the service out to run in a new daemon, called the admissiond. It also integrates this new daemon with the build infrastructure around Docker. Some notable changes: - Adds a new class, AdmissiondEnv, which performs the same function for the admissiond as ExecEnv does for impalads. - The '/admission' http endpoint is exposed on the admissiond's webui if the admission control service is in use, otherwise it is exposed on coordinator impalad's webuis. - start-impala-cluster.py takes a new flag --enable_admission_service which configures the minicluster to have an admissiond with all coordinators using it for admission control. - Coordinators are now configured to use the admission service by specifying the startup flag --admission_service_host. This is intended to mirror the configuration of the statestored/catalogd location. Testing: - Existing tests for the admission control serivce are modified to run with an admissiond. - Manually ran start-impala-cluster.py with --enable_admission_service and --docker_network to verify Docker integration. Change-Id: Id677814b31e9193035e8cf0d08aba0ce388a0ad9 Reviewed-on: http://gerrit.cloudera.org:8080/16891 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2021-01-13 06:03:37 +00:00
skyyws	1093a563e6	IMPALA-10368: Support required/optional property when creating Iceberg table We supported create required/optional field for Iceberg table in this patch. If we set 'NOT NULL' property for Iceberg table column in SQL, Impala will create required field by Iceberg api, 'NULL' or default will create optional field. Besides, 'DESCRIBE XXX' for Iceberg table will display 'optional' property like this: +------+--------+---------+----------+ \| name \| type \| comment \| nullable \| +------+--------+---------+----------+ \| id \| int \| \| false \| \| name \| string \| \| true \| \| age \| int \| \| true \| +------+--------+---------+----------+ And 'SHOW CREATE TABLE XXX' will also display 'NULL'/'NOT NULL' property for Iceberg table. Tests: * added new test in iceberg-create.test * added new test in iceberg-negative.test * added new test in show-create-table.test * modify 'DESCRIBE XXX' result in iceberg-create.test * modify 'DESCRIBE XXX' result in iceberg-alter.test * modify create table result in show-create-table.test Change-Id: I70b8014ba99f43df1b05149ff7a15cf06b6cd8d3 Reviewed-on: http://gerrit.cloudera.org:8080/16904 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2021-01-11 17:08:21 +00:00
Thomas Tauber-Marshall	aa48651cb5	IMPALA-9975 (part 1): Various refactors for admission control daemon This patch contains a variety of small refactors needed to enable the new admission control daemon, to separate them out from the main patch for ease of reviewing. The following changes are made: - A new class is introduced, DaemonEnv, which contains singleton objects common to all Impala daemons, currently just a MetricGroup and Webservers. The purpose is to reduce code duplication when the new admissiond daemon is added. This is analogous to how ExecEnv is used for impalad-specific singletons already. This patch modifies the catalogd and statestored to use DaemonEnv. impalads could also use DaemonEnv, but its tricky due to dependencies in the order of creation and initialization for objects such as the ReservationTracker and BufferPool relative to the MetricGroup and Webserver, so this is left for followup work. - Direct use of ExecEnv in ImpalaServicePool ahd AdmissionController is removed, as the admissiond will also need to use these classes and it will not have an ExecEnv. Testing: - Passed a run of existing core tests. Change-Id: I2e097e20458354f78bfc3477cac6fb3a2835f094 Reviewed-on: http://gerrit.cloudera.org:8080/16890 Reviewed-by: Thomas Tauber-Marshall <tmarshall@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2021-01-09 04:14:43 +00:00
Tim Armstrong	ab6b7960db	IMPALA-10027: configurable default anonymous user A username can be determined for a session via two mechanisms: * In a secure env, the user is authenticated by LDAP or Kerberos * In an unsecure env, the client specifies the user name, either as a parameter to the OpenSession API (HS2) or as a parameter to the first query run (beeswax) This patch affects what happens if neither of the above mechanisms is used. Previously we would end up with the username being an empty string, but this makes Ranger unhappy. Hive uses the name "anonymous" in this situation, so we change Impala's behaviour too. This is configurable by -anonymous_user_name. -anonymous_user_name= reverts to the old behaviour. Test * Add an end-to-end test that exercises this via impala-shell for HS2, HS2-HTTP and beeswax protocols. * Tweak a couple of existing tests that depended on the previous behavior. Change-Id: I6db491231fa22484aed476062b8fe4c8f69130b0 Reviewed-on: http://gerrit.cloudera.org:8080/16902 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2021-01-09 00:15:25 +00:00
Akos Kovacs	425e424b37	IMPALA-9687 Improve estimates for number of hosts in Kudu plans In some cases Kudu plans could contain more hosts than the actual number of executors. This commit fixes it by capping the number of hosts at the number of executors, and determining which executors have local scan ranges. Testing: - Ran core tests Updated Kudu planner tests where the memory estimates changed. Change-Id: I72e341597e980fb6a7e3792905b942ddf5797d03 Reviewed-on: http://gerrit.cloudera.org:8080/16880 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2021-01-08 04:41:43 +00:00
Joe McDonnell	35bae939ab	IMPALA-10427: Remove SkipIfS3.eventually_consistent pytest marker These tests were disabled due to S3's eventually consistent behavior. Now that S3 is strongly consistent, these tests do not need to be disabled. Testing: - Ran s3 core job Change-Id: Ie9041f530bf3a818f8954b31a3d01d9f6753d7d4 Reviewed-on: http://gerrit.cloudera.org:8080/16931 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2021-01-07 23:53:56 +00:00
Shajini Thayasingh	44bade8e7f	IMPALA-10091: [DOCS] add REFRESH_UPDATED_HMS_PARTITIONS query option remove trailing spaces added this new query option for Impala 4.0 Change-Id: I95b31b33f99073c57752e66eaf0f34facf511fc6 Reviewed-on: http://gerrit.cloudera.org:8080/16925 Reviewed-by: Vihang Karajgaonkar <vihang@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2021-01-07 20:05:50 +00:00
Thomas Tauber-Marshall	799bc22d70	IMPALA-10424: Fix race on not_admitted_reason in AdmissionController QueueNode::not_admitted_reason can be accessed concurrently by the coordinator thread that calls SubmitForAdmission and the admission control dequeue loop thread. This patch fixes this by ensuring that not_admitted_reason is only accessed if 'admission_ctrl_lock_' is held. Change-Id: Iacb3f37d8e1797c2b1d7bc32ba6368419e9ae444 Reviewed-on: http://gerrit.cloudera.org:8080/16926 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2021-01-07 00:22:46 +00:00
Shajini Thayasingh	0c7e5a4a7c	IMPALA-10388: [DOCS] add limitations on mask functions incorporated comments, removed the para as per the feedback listed all the overloads that are introduced stated that Impala does not yet support new Hive UDFs called out how mask functions were introduced through overloads Change-Id: I37f0bcf4cf586cc5cfd03e4df68443967b6bb88f Reviewed-on: http://gerrit.cloudera.org:8080/16861 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Quanlong Huang <huangquanlong@gmail.com>	2021-01-06 23:22:37 +00:00
gaoxiaoqing	200d3664f0	IMPALA-10412: ConvertToCNFRule can be applied to view table For OR predicates that reference a view, currently the ConvertToCNFRule does not get applied since it is considered a single table predicate even if the predicate might reference columns from different tables within the view. This patch enables the application of this rule for such predicates by checking the expanded view and if it satisfies the criterion then the rule can be applied and the predicate can be pushed eventually to the scan. Testing: Added planner test in inline-view.test Change-Id: Ie7a9a215d6b92aec07153e643268370f34186c88 Reviewed-on: http://gerrit.cloudera.org:8080/16912 Reviewed-by: Aman Sinha <amsinha@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2021-01-06 22:36:10 +00:00
stiga-huang	e7839c4530	IMPALA-10416: Add raw string mode for testfiles to verify non-ascii results Currently, the result section of the testfile is required to used escaped strings. Take the following result section as an example: --- RESULTS 'Alice\nBob' 'Alice\\nBob' The first line is a string with a newline character. The second line is a string with a '\' and an 'n' character. When comparing with the actual query results, we need to escape the special characters in the actual results, e.g. replace newline characters with '\n'. This is done by invoking encode('unicode_escape') on the actual result strings. However, the input type of this method is unicode instead of str. When calling it on str vars, Python will implicitly convert the input vars to unicode type. The default encoding, ascii, is used. This causes UnicodeDecodeError when the str contains non-ascii bytes. To fix this, this patch explicitly decodes the input str using 'utf-8' encoding. After fixing the logic of escaping the actual result strings, the next problem is that it's painful to write unicode-escaped expected results. Here is an example: ---- QUERY select "你好\n你好" ---- RESULTS '\u4f60\u597d\n\u4f60\u597d' ---- TYPES STRING It's painful to manually translate the unicode characters. This patch adds a new comment, RAW_STRING, for the result section to use raw strings instead of unicode-escaped strings. Here is an example: ---- QUERY select "你好" ---- RESULTS: RAW_STRING '你好' ---- TYPES STRING If the result contains special characters, it's recommended to use the default string mode. If the special characters only contain newline characters, we can use RAW_STRING and the existing MULTI_LINE comment together. This patch also fixes the issue that pytest fails to report assertion failures if any of the compared str values contain non-ascii bytes (IMPALA-10419). However, pytest works if the compared values are both in unicode type. So we explicitly converting the actual and expected str values to unicode type. Test: - Add tests in special-strings.test for raw string mode and the escaped string mode (default). - Run test_exprs.py::TestExprs::test_special_strings locally. Change-Id: I7cc2ea3e5849bd3d973f0cb91322633bcc0ffa4b Reviewed-on: http://gerrit.cloudera.org:8080/16919 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2021-01-06 04:39:56 +00:00
Tim Armstrong	868a01dca9	IMPALA-6101: call DataStreamMgr::Cancel() once per query This is a bit of cleanup left over from the KRPC work that could avoid some lock contention for queries with large numbers of fragments. The change is just to do cancellation of receivers once per query instead of once per fragment. Change-Id: I7677d21f0aaddc3d4b56f72c0470ea850e34611e Reviewed-on: http://gerrit.cloudera.org:8080/16901 Reviewed-by: Thomas Tauber-Marshall <tmarshall@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2021-01-05 23:55:51 +00:00
Tim Armstrong	1d5fe2771f	IMPALA-6434: Add support to decode RLE_DICTIONARY encoded pages The encoding is identical to the already-supported PLAIN_DICTIONARY encoding but the PLAIN enum value is used for the dictionary pages and the RLE_DICTIONARY enum value is used for the data pages. A hidden option -write_new_parquet_dictionary_encodings is added to turn on writing too, for test purposes only. Testing: * Added an automated test using a pregenerated test file. * Ran core tests. * Manually tested by writing out TPC-H lineitem with the new encoding and reading back in Impala and Hive. Parquet-tools output for the generated test file: $ hadoop jar ~/repos/parquet-mr/parquet-tools/target/parquet-tools-1.12.0-SNAPSHOT.jar meta /test-warehouse/att/824de2afebad009f-6f460ade00000003_643159826_data.0.parq 20/12/21 20:28:36 INFO hadoop.ParquetFileReader: Initiating action with parallelism: 5 20/12/21 20:28:36 INFO hadoop.ParquetFileReader: reading another 1 footers 20/12/21 20:28:36 INFO hadoop.ParquetFileReader: Initiating action with parallelism: 5 file: hdfs://localhost:20500/test-warehouse/att/824de2afebad009f-6f460ade00000003_643159826_data.0.parq creator: impala version 4.0.0-SNAPSHOT (build 7b691c5d4249f0cb1ced8ddf01033fbbe10511d9) file schema: schema -------------------------------------------------------------------------------- id: OPTIONAL INT32 L:INTEGER(32,true) R:0 D:1 bool_col: OPTIONAL BOOLEAN R:0 D:1 tinyint_col: OPTIONAL INT32 L:INTEGER(8,true) R:0 D:1 smallint_col: OPTIONAL INT32 L:INTEGER(16,true) R:0 D:1 int_col: OPTIONAL INT32 L:INTEGER(32,true) R:0 D:1 bigint_col: OPTIONAL INT64 L:INTEGER(64,true) R:0 D:1 float_col: OPTIONAL FLOAT R:0 D:1 double_col: OPTIONAL DOUBLE R:0 D:1 date_string_col: OPTIONAL BINARY R:0 D:1 string_col: OPTIONAL BINARY R:0 D:1 timestamp_col: OPTIONAL INT96 R:0 D:1 year: OPTIONAL INT32 L:INTEGER(32,true) R:0 D:1 month: OPTIONAL INT32 L:INTEGER(32,true) R:0 D:1 row group 1: RC:8 TS:754 OFFSET:4 -------------------------------------------------------------------------------- id: INT32 SNAPPY DO:4 FPO:48 SZ:74/73/0.99 VC:8 ENC:RLE,RLE_DICTIONARY ST:[min: 0, max: 7, num_nulls: 0] bool_col: BOOLEAN SNAPPY DO:0 FPO:141 SZ:26/24/0.92 VC:8 ENC:RLE,PLAIN ST:[min: false, max: true, num_nulls: 0] tinyint_col: INT32 SNAPPY DO:220 FPO:243 SZ:51/47/0.92 VC:8 ENC:RLE,RLE_DICTIONARY ST:[min: 0, max: 1, num_nulls: 0] smallint_col: INT32 SNAPPY DO:343 FPO:366 SZ:51/47/0.92 VC:8 ENC:RLE,RLE_DICTIONARY ST:[min: 0, max: 1, num_nulls: 0] int_col: INT32 SNAPPY DO:467 FPO:490 SZ:51/47/0.92 VC:8 ENC:RLE,RLE_DICTIONARY ST:[min: 0, max: 1, num_nulls: 0] bigint_col: INT64 SNAPPY DO:586 FPO:617 SZ:59/55/0.93 VC:8 ENC:RLE,RLE_DICTIONARY ST:[min: 0, max: 10, num_nulls: 0] float_col: FLOAT SNAPPY DO:724 FPO:747 SZ:51/47/0.92 VC:8 ENC:RLE,RLE_DICTIONARY ST:[min: -0.0, max: 1.1, num_nulls: 0] double_col: DOUBLE SNAPPY DO:845 FPO:876 SZ:59/55/0.93 VC:8 ENC:RLE,RLE_DICTIONARY ST:[min: -0.0, max: 10.1, num_nulls: 0] date_string_col: BINARY SNAPPY DO:983 FPO:1028 SZ:74/88/1.19 VC:8 ENC:RLE,RLE_DICTIONARY ST:[min: 0x30312F30312F3039, max: 0x30342F30312F3039, num_nulls: 0] string_col: BINARY SNAPPY DO:1143 FPO:1168 SZ:53/49/0.92 VC:8 ENC:RLE,RLE_DICTIONARY ST:[min: 0x30, max: 0x31, num_nulls: 0] timestamp_col: INT96 SNAPPY DO:1261 FPO:1329 SZ:98/138/1.41 VC:8 ENC:RLE,RLE_DICTIONARY ST:[num_nulls: 0, min/max not defined] year: INT32 SNAPPY DO:1451 FPO:1470 SZ:47/43/0.91 VC:8 ENC:RLE,RLE_DICTIONARY ST:[min: 2009, max: 2009, num_nulls: 0] month: INT32 SNAPPY DO:1563 FPO:1594 SZ:60/56/0.93 VC:8 ENC:RLE,RLE_DICTIONARY ST:[min: 1, max: 4, num_nulls: 0] Parquet-tools output for one of the lineitem files: $ hadoop jar ~/repos/parquet-mr/parquet-tools/target/parquet-tools-1.12.0-SNAPSHOT.jar meta /test-warehouse/li2/4b4d9143c575dd71-3f69d3cf00000001_1879643220_data.0.parq 20/12/22 09:39:56 INFO hadoop.ParquetFileReader: Initiating action with parallelism: 5 20/12/22 09:39:56 INFO hadoop.ParquetFileReader: reading another 1 footers 20/12/22 09:39:56 INFO hadoop.ParquetFileReader: Initiating action with parallelism: 5 file: hdfs://localhost:20500/test-warehouse/li2/4b4d9143c575dd71-3f69d3cf00000001_1879643220_data.0.parq creator: impala version 4.0.0-SNAPSHOT (build 7b691c5d4249f0cb1ced8ddf01033fbbe10511d9) file schema: schema -------------------------------------------------------------------------------- l_orderkey: OPTIONAL INT64 L:INTEGER(64,true) R:0 D:1 l_partkey: OPTIONAL INT64 L:INTEGER(64,true) R:0 D:1 l_suppkey: OPTIONAL INT64 L:INTEGER(64,true) R:0 D:1 l_linenumber: OPTIONAL INT32 L:INTEGER(32,true) R:0 D:1 l_quantity: OPTIONAL FIXED_LEN_BYTE_ARRAY L:DECIMAL(12,2) R:0 D:1 l_extendedprice: OPTIONAL FIXED_LEN_BYTE_ARRAY L:DECIMAL(12,2) R:0 D:1 l_discount: OPTIONAL FIXED_LEN_BYTE_ARRAY L:DECIMAL(12,2) R:0 D:1 l_tax: OPTIONAL FIXED_LEN_BYTE_ARRAY L:DECIMAL(12,2) R:0 D:1 l_returnflag: OPTIONAL BINARY R:0 D:1 l_linestatus: OPTIONAL BINARY R:0 D:1 l_shipdate: OPTIONAL BINARY R:0 D:1 l_commitdate: OPTIONAL BINARY R:0 D:1 l_receiptdate: OPTIONAL BINARY R:0 D:1 l_shipinstruct: OPTIONAL BINARY R:0 D:1 l_shipmode: OPTIONAL BINARY R:0 D:1 l_comment: OPTIONAL BINARY R:0 D:1 row group 1: RC:1724693 TS:58432195 OFFSET:4 -------------------------------------------------------------------------------- l_orderkey: INT64 SNAPPY DO:4 FPO:159797 SZ:2839537/13147604/4.63 VC:1724693 ENC:RLE,RLE_DICTIONARY,PLAIN ST:[min: 2142211, max: 6000000, num_nulls: 0] l_partkey: INT64 SNAPPY DO:2839640 FPO:3028619 SZ:8179566/13852808/1.69 VC:1724693 ENC:RLE,RLE_DICTIONARY,PLAIN ST:[min: 1, max: 200000, num_nulls: 0] l_suppkey: INT64 SNAPPY DO:11019308 FPO:11059413 SZ:3063563/3103196/1.01 VC:1724693 ENC:RLE,RLE_DICTIONARY ST:[min: 1, max: 10000, num_nulls: 0] l_linenumber: INT32 SNAPPY DO:14082964 FPO:14083007 SZ:412884/650550/1.58 VC:1724693 ENC:RLE,RLE_DICTIONARY ST:[min: 1, max: 7, num_nulls: 0] l_quantity: FIXED_LEN_BYTE_ARRAY SNAPPY DO:14495934 FPO:14496204 SZ:1298038/1297963/1.00 VC:1724693 ENC:RLE,RLE_DICTIONARY ST:[min: 1.00, max: 50.00, num_nulls: 0] l_extendedprice: FIXED_LEN_BYTE_ARRAY SNAPPY DO:15794062 FPO:16003224 SZ:9087746/10429259/1.15 VC:1724693 ENC:RLE,RLE_DICTIONARY,PLAIN ST:[min: 904.00, max: 104949.50, num_nulls: 0] l_discount: FIXED_LEN_BYTE_ARRAY SNAPPY DO:24881912 FPO:24881976 SZ:866406/866338/1.00 VC:1724693 ENC:RLE,RLE_DICTIONARY ST:[min: 0.00, max: 0.10, num_nulls: 0] l_tax: FIXED_LEN_BYTE_ARRAY SNAPPY DO:25748406 FPO:25748463 SZ:866399/866325/1.00 VC:1724693 ENC:RLE,RLE_DICTIONARY ST:[min: 0.00, max: 0.08, num_nulls: 0] l_returnflag: BINARY SNAPPY DO:26614888 FPO:26614918 SZ:421113/421069/1.00 VC:1724693 ENC:RLE,RLE_DICTIONARY ST:[min: 0x41, max: 0x52, num_nulls: 0] l_linestatus: BINARY SNAPPY DO:27036081 FPO:27036106 SZ:262209/270332/1.03 VC:1724693 ENC:RLE,RLE_DICTIONARY ST:[min: 0x46, max: 0x4F, num_nulls: 0] l_shipdate: BINARY SNAPPY DO:27298370 FPO:27309301 SZ:2602937/2627148/1.01 VC:1724693 ENC:RLE,RLE_DICTIONARY ST:[min: 0x313939322D30312D3032, max: 0x313939382D31322D3031, num_nulls: 0] l_commitdate: BINARY SNAPPY DO:29901405 FPO:29912079 SZ:2602680/2626308/1.01 VC:1724693 ENC:RLE,RLE_DICTIONARY ST:[min: 0x313939322D30312D3331, max: 0x313939382D31302D3331, num_nulls: 0] l_receiptdate: BINARY SNAPPY DO:32504185 FPO:32515219 SZ:2603040/2627498/1.01 VC:1724693 ENC:RLE,RLE_DICTIONARY ST:[min: 0x313939322D30312D3036, max: 0x313939382D31322D3330, num_nulls: 0] l_shipinstruct: BINARY SNAPPY DO:35107326 FPO:35107408 SZ:434968/434917/1.00 VC:1724693 ENC:RLE,RLE_DICTIONARY ST:[min: 0x434F4C4C45435420434F44, max: 0x54414B45204241434B2052455455524E, num_nulls: 0] l_shipmode: BINARY SNAPPY DO:35542401 FPO:35542471 SZ:650639/650580/1.00 VC:1724693 ENC:RLE,RLE_DICTIONARY ST:[min: 0x414952, max: 0x545255434B, num_nulls: 0] l_comment: BINARY SNAPPY DO:36193124 FPO:36711343 SZ:22240470/52696671/2.37 VC:1724693 ENC:RLE,RLE_DICTIONARY,PLAIN ST:[min: 0x20546972657369617320, max: 0x7A7A6C653F20626C697468656C792069726F6E69, num_nulls: 0] Change-Id: I90942022edcd5d96c720a1bde53879e50394660a Reviewed-on: http://gerrit.cloudera.org:8080/16893 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2021-01-05 23:30:35 +00:00
Aman Sinha	49680559b0	IMPALA-10182: Don't add inferred identity predicates to SELECT node For an inferred equality predicates of type c1 = c2 if both sides are referring to the same underlying tuple and slot, it is an identity predicate which should not be evaluated by the SELECT node since it will incorrectly eliminate NULL rows. This patch fixes the behavior. Testing: - Added planner tests with base table and with outer join - Added runtime tests with base table and with outer join - Added planner test for IMPALA-9694 (same root cause) - Ran PlannerTest .. no other plans changed Change-Id: I924044f582652dbc50085851cc639f3dee1cd1f4 Reviewed-on: http://gerrit.cloudera.org:8080/16917 Reviewed-by: Aman Sinha <amsinha@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2021-01-05 23:04:25 +00:00
Zoltan Borok-Nagy	03af0b2c8c	IMPALA-10422: EXPLAIN statements leak ACID transactions and locks Currently EXPLAIN statements might open ACID transactions and create locks on ACID tables. This is not necessary since we won't modify the table. But the real problem is that these transactions and locks are leaked and open forever. They are even getting heartbeated while the coordinator is still running. The solution is to not consume any ACID resources for EXPLAIN statements. Testing: * Added EXPLAIN INSERT OVERWRITE in front of an actual INSERT OVERWRITE in an e2e test Change-Id: I05113b1fd9a3eb2d0dd6cf723df916457f3fbf39 Reviewed-on: http://gerrit.cloudera.org:8080/16923 Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2021-01-05 21:31:05 +00:00
Tim Armstrong	a5f6c26044	IMPALA-2536: Make ColumnType constructor explicit This avoids accidental implicit type conversions from the PrimitiveType enum, or worse, integer values via the enum. Testing: Ran core tests. Change-Id: I2fe1d5da051c10904605328607bea78565356ef3 Reviewed-on: http://gerrit.cloudera.org:8080/16906 Reviewed-by: Wenzhe Zhou <wzhou@cloudera.com> Reviewed-by: Quanlong Huang <huangquanlong@gmail.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-12-31 06:19:20 +00:00
xqhe	5baadd1da7	IMPALA-10406: Query with analytic functions doesn't need to materialize the predicates bounded to kudu Before when query with analytic functions will materialize the unassigned conjuncts. But for the predicates that can be evaluated by kudu don't need to materialize. This optimization can reduce the amount of data to exchange and sort. Testing: - Add planner test in analytic-fns.test Change-Id: Iba8371eff6ae1bcffd51b44843175c52f2127e46 Reviewed-on: http://gerrit.cloudera.org:8080/16905 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>	2020-12-28 05:16:21 +00:00
Tim Armstrong	43b6093dc0	IMPALA-10117: Skip calls to FsPermissionCache for blob stores This avoids calling precacheChildrenOf() in cases when the cached values will never be used. This change simply skips calling precacheChildrenOf() in the cases when getPermissions() is never called. There is some opportunity to clean up this permissions checking further, but I decided to keep this fix limited in scope. Change-Id: I2034695a956307309f656d56aa57aa07ae5163d8 Reviewed-on: http://gerrit.cloudera.org:8080/16898 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-12-25 07:06:29 +00:00
Riza Suminto	a6a2440995	IMPALA-10374: Limit iteration at BufferedTupleStream::DebugString BufferedTupleStream::DebugString() iterate std::list<Page> that can potentially grow very large. As consequent, the returned string can grow large as well and cause a problem as previously happen in IMPALA-9851. With this patch, BufferedTupleStream::DebugString() only include maximum of 100 first pages of page list. Testing: - Add new be test SimpleTupleStreamTest.ShortDebugString in buffered-tuple-stream-test.cc - Pass core tests Change-Id: I6626c8d54f35f303c01f85be1dd9aa54c8ad9a2d Reviewed-on: http://gerrit.cloudera.org:8080/16884 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Quanlong Huang <huangquanlong@gmail.com>	2020-12-24 18:55:24 +00:00
Riza Suminto	29069b9499	IMPALA-9550: Fix flakiness in TestResultSpoolingFetchSize.test_fetch TestResultSpoolingFetchSize.test_fetch has been flaky in ubuntu-16.04-dockerised environment for not reaching finished state within 10 seconds. This patch increase the timeout of the test to 30 seconds. Testing: - Looped the test locally. Change-Id: Id2e8a9db904da5f1e4acc9e18b3987b8a4ec24e5 Reviewed-on: http://gerrit.cloudera.org:8080/16895 Reviewed-by: Bikramjeet Vig <bikramjeet.vig@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-12-23 00:54:24 +00:00
Fucun Chu	4099a60689	IMPALA-10317: Add query option that limits huge joins at runtime This patch adds support for limiting the rows produced by a join node such that runaway join queries can be prevented. The limit is specified by a query option. Queries exceeding that limit get terminated. The checking runs periodically, so the actual rows produced may go somewhat over the limit. JOIN_ROWS_PRODUCED_LIMIT is exposed as an advanced query option. Rows produced Query profile is updated to include query wide and per backend metrics for RowsReturned. Example from " set JOIN_ROWS_PRODUCED_LIMIT = 10000000; select count() from tpch_parquet.lineitem l1 cross join (select from tpch_parquet.lineitem l2 limit 5) l3;": NESTED_LOOP_JOIN_NODE (id=2): - InactiveTotalTime: 107.534ms - PeakMemoryUsage: 16.00 KB (16384) - ProbeRows: 1.02K (1024) - ProbeTime: 0.000ns - RowsReturned: 10.00M (10002025) - RowsReturnedRate: 749.58 K/sec - TotalTime: 13s337ms Testing: Added tests for JOIN_ROWS_PRODUCED_LIMIT Change-Id: Idbca7e053b61b4e31b066edcfb3b0398fa859d02 Reviewed-on: http://gerrit.cloudera.org:8080/16706 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-12-22 06:10:39 +00:00

1 2 3 4 5 ...

9647 Commits