impala

mirror of https://github.com/apache/impala.git synced 2026-02-03 09:00:39 -05:00

Author	SHA1	Message	Date
Aman Sinha	5e9f10d34c	IMPALA-10064: Support constant propagation for eligible range predicates This patch adds support for constant propagation of range predicates involving date and timestamp constants. Previously, only equality predicates were considered for propagation. The new type of propagation is shown by the following example: Before constant propagation: WHERE date_col = CAST(timestamp_col as DATE) AND timestamp_col BETWEEN '2019-01-01' AND '2020-01-01' After constant propagation: WHERE date_col >= '2019-01-01' AND date_col <= '2020-01-01' AND timestamp_col >= '2019-01-01' AND timestamp_col <= '2020-01-01' AND date_col = CAST(timestamp_col as DATE) As a consequence, since Impala supports table partitioning by date columns but not timestamp columns, the above propagation enables partition pruning based on timestamp ranges. Existing code for equality based constant propagation was refactored and consolidated into a new class which handles both equality and range based constant propagation. Range based propagation is only applied to date and timestamp columns. Testing: - Added new range constant propagation tests to PlannerTest. - Added e2e test for range constant propagation based on a newly added date partitioned table. - Ran precommit tests. Change-Id: I811a1f8d605c27c7704d7fc759a91510c6db3c2b Reviewed-on: http://gerrit.cloudera.org:8080/16346 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-09-02 22:57:55 +00:00
Shant Hovsepian	f4273a40fe	IMPALA-7310: Partial fix for NDV cardinality with NULLs. This fix just handles the case where a column's cardinality is zero however it's nullable and we have null stats to indicate there are null values, therefore we adjust the cardinality from 0 to 1. The cardinality of zero was especially problematic when calculating cardinalities for multiple predicates with multiplication. The 0 would propagate up the plan tree and result in poor plan choices such as always using broadcast joins where shuffle would've been more optimal. Testing: * 26 Node TPC-DS 30TB run had better plans for Q4 and Q11 - Q4 172s -> 80s - Q11 103s -> 77s * CardinalityTest * TpcdsPlannerTest Change-Id: Iec967053b4991f8c67cde62adf003cbd3f429032 Reviewed-on: http://gerrit.cloudera.org:8080/16349 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-09-01 08:54:42 +00:00
Shant Hovsepian	827070b473	IMPALA-10099: Push down DISTINCT in Set operations INTERSECT/EXCEPT are not duplicate preserving operations. The distinct aggregations can happen in each operand, the leftmost operand only, or after all the operands in a separate aggregation step. Except for a couple special cases we would use the last strategy most often. This change pushes the distinct aggregation down to the leftmost operand in cases where there are no analytic functions, or when a distinct or grouping operation already eliminates duplicates. In general DISTINCT placement such as in this case should be done throughout the entire plan tree in a cost based manner as described in IMPALA-5260 Testing: * TpcdsPlannerTest * PlannerTest * TPC-DS 30TB Perf run for any affected queries - Q14-1 180s -> 150s - Q14-2 109s -> 90s - Q8 no significant change * SetOperation Planner Tests * Analyzer tests * Tpcds Functional Workload Change-Id: Ia248f1595df2ab48fbe70c778c7c32bde5c518a5 Reviewed-on: http://gerrit.cloudera.org:8080/16350 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>	2020-08-31 18:34:07 +00:00
Shant Hovsepian	0fcf846592	IMPALA-10095: Include query plan tests for all of TPC-DS Added TpcdsPlannerTest to include each TPC-DS query as a separate plan test file. Removed the previous tpcds-all test file. This means when running only PlannerTest no TPC-DS plans are checked, however as part of a full frontend test run the TpcdsPlannerTest will be included. Runs with cardinality and resource checks, as well as using parquet tables to include predicate pushdowns. Change-Id: Ibaf40d8b783be1dc7b62ba3269feb034cb8047da Reviewed-on: http://gerrit.cloudera.org:8080/16345 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>	2020-08-27 00:11:56 +00:00
Tim Armstrong	e133d1838a	IMPALA-7782: fix constant NOT IN subqueries that can return 0 rows The bug was the the statement rewriter converted NOT IN <subquery> predicates to != <subquery> predicates when the subquery could be an empty set. This was invalid, because NOT IN (<empty set>) is true, but != (<empty set>) is false. Testing: Added targeted planner and end-to-end tests. Ran exhaustive tests. Change-Id: I66c726f0f66ce2f609e6ba44057191f5929a67fc Reviewed-on: http://gerrit.cloudera.org:8080/16338 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-08-25 23:08:40 +00:00
Qifan Chen	5fedf7bf72	IMPALA-9744: Treat corrupt table stats as missing to avoid bad plans This work addresses the current limitation in computing the total row count for a Hive table in a scan. The row count can be incorrectly computed as 0, even though there exists data in the Hive table. This is the stats corruption at table level. Similar stats corruption exists for a partition. The row count of a table or a partition sometime can also be -1 which indicates a missing stats situation. In the fix, as long as no partition in a Hive table exhibits any missing or corrupt stats, the total row count for the table is computed from the row counts in all partitions. Otherwise, Impala looks at the table level stats particularly the table row count. In addition, if the table stats is missing or corrupted, Impala estimates a row count for the table, if feasible. This row count is the sum of the row count from the partitions with good stats, and an estimation of the number of rows in the partitions with missing or corrupt stats. Such estimation also applies when some partition has corrupt stats. One way to observe the fix is through the explain of queries scanning Hive tables with missing or corrupted stats. The cardinality for any full scan should be a positive value (i.e. the estimated row count), instead of 'unavailable'. At the beginning of the explain output, that table is still listed in the WARNING section for potentially corrupt table statistics. Testing: 1. Ran unit tests with queries documented in the case against Hive tables with the following configrations: a. No stats corruption in any partitions b. Stats corruption in some partitions c. Stats corruption in all partitions 2. Added two new tests in test_compute_stats.py: a. test_corrupted_stats_in_partitioned_Hive_tables b. test_corrupted_stats_in_unpartitioned_Hive_tables 3. Fixed failures in corrupt-stats.test 4. Ran "core" test Change-Id: I9f4c64616ff7c0b6d5a48f2b5331325feeff3576 Reviewed-on: http://gerrit.cloudera.org:8080/16098 Reviewed-by: Sahil Takiar <stakiar@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-08-13 03:10:16 +00:00
Zoltan Borok-Nagy	da34d34a42	IMPALA-9859: Full ACID Milestone 4: Part 2 Reading modified tables (complex types) This implements scanning full ACID tables that contain complex types. The same technique works that we use for primitive types. I.e. we add a LEFT ANTI JOIN on top of the Hdfs scan node in order to subtract the deleted rows from the inserted rows. However, there were some types of queries where we couldn't do that. These are the queries that scan the nested collection items directly. E.g.: SELECT item FROM complextypestbl.int_array; The above query only creates a single tuple descriptor that holds the collection items. Since this tuple descriptor is not at the table-level, we cannot add slot references to the hidden ACID column which are at the top level of the table schema. To resolve this I added a statement rewriter that rewrites the above statement to the following: SELECT item FROM complextypestbl $a$1, $a$1.int_array; Now in this example we'll have two tuple descriptors, one for the table-level, and one for the collection item. So we can add the ACID slot refs to the table-level tuple descriptor. The rewrite is implemented by the new AcidRewriter class. Performance I executed the following query with num_nodes=1 on a non-transactional table (without the rewrite), and on an ACID table (with the rewrite): select count() from customer_nested.c_orders.o_lineitems; Without the rewrite: Fetched 1 row(s) in 0.41s +--------------+--------+-------+----------+----------+-------+------------+----------+---------------+---------------------------------------------------+ \| Operator \| #Hosts \| #Inst \| Avg Time \| Max Time \| #Rows \| Est. #Rows \| Peak Mem \| Est. Peak Mem \| Detail \| +--------------+--------+-------+----------+----------+-------+------------+----------+---------------+---------------------------------------------------+ \| F00:ROOT \| 1 \| 1 \| 13.61us \| 13.61us \| \| \| 0 B \| 0 B \| \| \| 01:AGGREGATE \| 1 \| 1 \| 3.68ms \| 3.68ms \| 1 \| 1 \| 16.00 KB \| 10.00 MB \| FINALIZE \| \| 00:SCAN HDFS \| 1 \| 1 \| 280.47ms \| 280.47ms \| 6.00M \| 15.00M \| 56.98 MB \| 8.00 MB \| tpch_nested_orc_def.customer.c_orders.o_lineitems \| +--------------+--------+-------+----------+----------+-------+------------+----------+---------------+---------------------------------------------------+ With the rewrite: Fetched 1 row(s) in 0.42s +---------------------------+--------+-------+----------+----------+---------+------------+----------+---------------+---------------------------------------+ \| Operator \| #Hosts \| #Inst \| Avg Time \| Max Time \| #Rows \| Est. #Rows \| Peak Mem \| Est. Peak Mem \| Detail \| +---------------------------+--------+-------+----------+----------+---------+------------+----------+---------------+---------------------------------------+ \| F00:ROOT \| 1 \| 1 \| 25.16us \| 25.16us \| \| \| 0 B \| 0 B \| \| \| 05:AGGREGATE \| 1 \| 1 \| 3.44ms \| 3.44ms \| 1 \| 1 \| 63.00 KB \| 10.00 MB \| FINALIZE \| \| 01:SUBPLAN \| 1 \| 1 \| 16.52ms \| 16.52ms \| 6.00M \| 125.92M \| 47.00 KB \| 0 B \| \| \| \|--04:NESTED LOOP JOIN \| 1 \| 1 \| 188.47ms \| 188.47ms \| 0 \| 10 \| 24.00 KB \| 12 B \| CROSS JOIN \| \| \| \|--02:SINGULAR ROW SRC \| 1 \| 1 \| 0ns \| 0ns \| 0 \| 1 \| 0 B \| 0 B \| \| \| \| 03:UNNEST \| 1 \| 1 \| 25.37ms \| 25.37ms \| 0 \| 10 \| 0 B \| 0 B \| $a$1.c_orders.o_lineitems o_lineitems \| \| 00:SCAN HDFS \| 1 \| 1 \| 96.26ms \| 96.26ms \| 100.00K \| 12.59M \| 38.19 MB \| 72.00 MB \| default.customer_nested $a$1 \| +---------------------------+--------+-------+----------+----------+---------+------------+----------+---------------+---------------------------------------+ So the overhead is very little. Testing Added planner tests to PlannerTest/acid-scans.test * E2E query tests to QueryTest/full-acid-complex-type-scans.test * E2E tests for rowid-generation: QueryTest/full-acid-rowid.test Change-Id: I8b2c6cd3d87c452c5b96a913b14c90ada78d4c6f Reviewed-on: http://gerrit.cloudera.org:8080/16228 Reviewed-by: Zoltan Borok-Nagy <boroknagyz@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>	2020-08-12 17:45:50 +00:00
Sahil Takiar	a0057788c5	IMPALA-9478: Profiles should indicate if custom UDFs are being used Adds a marker to runtime profiles and explain plans indicating if custom (e.g. non-built in) user-defined functions are being used. For explain plans, a SQL-style comment is added after any function call. For runtime profiles, a new Frontend entry called "User Defined Functions (UDFs)" lists out all UDFs analyzed during planning. Take the following example: create function hive_lower(string) returns string location '/test-warehouse/hive-exec.jar' symbol='org.apache.hadoop.hive.ql.udf.UDFLower'; set explain_level=3; explain select * from functional.alltypes order by hive_lower(string_col); ... 01:SORT order by: default.hive_lower(string_col) /* JAVA UDF / ASC materialized: default.hive_lower(string_col) / JAVA UDF / ... This shows up in the runtime profile as well. When the above query is actually run, the runtime profile includes the following entry: Frontend User Defined Functions (UDFs): default.hive_lower Error messages will also include SQL-style comments about any UDFs used. For example: select aggfn(int_col) over (partition by int_col) from functional.alltypesagg Throws: Aggregate function 'default.aggfn(int_col) / NATIVE UDF /' not supported with OVER clause. Testing: Added tests to test_udfs.py * Ran core tests Change-Id: I79122e6cc74fd5a62c76962289a1615fbac2f345 Reviewed-on: http://gerrit.cloudera.org:8080/16188 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-08-07 20:08:21 +00:00
Bikramjeet Vig	0a13029afc	IMPALA-8125: Add query option to limit number of hdfs writer instances This patch adds a new query option MAX_FS_WRITERS that limits the number of HDFS writer instances. Highlights: - Depending on the plan, it either restricts the num of instances of the root fragment or adds an exchange and then limits the num of instances of that. - Assigns instances evenly across available backends. - "no-shuffle" query hint is ignored when using query option. - Change in behavior of plans is only when this query option is used. - The only exception to the previous point is that the optimization logic that decides to add an exchange now looks at the num of instances instead of the number of nodes. Limitation: A mismatch of cluster state during query planning and scheduling can result in more or less fragment instances to be scheduled than expected. Eg. If max_fs_writers in 2 and the planner sees only 2 executors then it might not add an exchange between a scan node and the table sink, but during scheduling if there are 3 nodes then that scan+tablesink instance will be scheduled on 3 backends. Testing: - Added planner tests to cover all cases where this enforcement kicks in and to highlight the behavior. - Added e2e tests to confirm that the scheduler is enforcing the limit and distributing the instance evenly across backends for different plan shapes. Change-Id: I17c8e61b9a32d908eec82c83618ff9caa41078a5 Reviewed-on: http://gerrit.cloudera.org:8080/16204 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-08-04 05:56:03 +00:00
Aman Sinha	ecfc1af0db	IMPALA-9983 : Pushdown limit to analytic sort operator This patch pushes the LIMIT from a top level Sort down to the Sort below an Analytic operator when it is safe to do so. There are several qualifying checks that are done. The optimization is done at the time of creating the top level Sort in the single node planner. When the pushdown is applicable, the analytic sort is converted to a TopN sort. Further, this is split into a bottom TopN and an upper TopN separated by a hash partition exchange. This ensures that the limit is applied as early as possible before hash partitioning. Fixed couple of additional related issues uncovered as a result of limit pushdown: - Changed the analytic sort's partition-by expr sort semantic from NULLS FIRST to NULLS LAST to ensure correctness in the presence of limit. - The LIMIT on the analytic sort node was causing it to be treated as a merging point in the distributed planner. Fixed it by introducing an api allowPartitioned() in the PlanNode. Testing: - Ran PlannerTest and updated several EXPLAIN plans. - Added Planner tests for both positive and negative cases of limit pushdown. - Ran end-to-end TPC-DS queries. Specifically tested TPC-DS q67 for limit pushdown and result correctness. - Added targeted end-to-end tests using TPC-H dataset. Change-Id: Ib39f46a7bb75a34466eef7f91ddc25b6e6c99284 Reviewed-on: http://gerrit.cloudera.org:8080/16219 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-08-02 02:42:40 +00:00
Shant Hovsepian	ea3f073881	IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT] INTERSECT and EXCEPT set operations are implemented as rewrites to joins. Currently only the DISTINCT qualified operators are implemented, not ALL qualified. The operator MINUS is supported as an alias for EXCEPT. We mimic Oracle and Hive's non-standard implementation which treats all operators with the same precedence, as opposed to the SQL Standard of giving INTERSECT higher precedence. A new class SetOperationStmt was created to encompass the previous UnionStmt behavior. UnionStmt is preserved as a special case of union only operands to ensure compatibility with previous union planning behavior. Tests: * Added parser and analyzer tests. * Ensured no test failures or plan changes for union tests. * Added TPC-DS queries 14,38,87 to functional and planner tests. * Added functional tests test_intersect test_except * New planner testSetOperationStmt Change-Id: I5be46f824217218146ad48b30767af0fc7edbc0f Reviewed-on: http://gerrit.cloudera.org:8080/16123 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Aman Sinha <amsinha@cloudera.com> Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>	2020-07-31 17:23:45 +00:00
Tim Armstrong	63f5e8ec00	IMPALA-1270: add distinct aggregation to semi joins When generating plans with left semi/anti joins (typically resulting from subquery rewrites), the planner now considers inserting a distinct aggregation on the inner side of the join. The decision is based on whether that aggregation would reduce the number of rows by more than 75%. This is fairly conservative and the optimization might be beneficial for smaller reductions, but the conservative threshold is chosen to reduce the number of potential plan regressions. The aggregation can both reduce the # of rows and the width of the rows, by projecting out unneeded slots. ENABLE_DISTINCT_SEMI_JOIN_OPTIMIZATION query option is added to allow toggling the optimization. Tests: * Add positive and negative planner tests for various cases - including semi/anti joins, missing stats, broadcast/shuffle, different numbers of join predicates. * Add some end-to-end tests to verify plans execute correctly. Change-Id: Icbb955e805d9e764edf11c57b98f341b88a37fcc Reviewed-on: http://gerrit.cloudera.org:8080/16180 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-07-15 17:10:50 +00:00
Zoltan Borok-Nagy	f602c3f80f	IMPALA-9859: Full ACID Milestone 4: Part 1 Reading modified tables (primitive types) Hive ACID supports row-level DELETE and UPDATE operations on a table. It achieves it via assigning a unique row-id for each row, and maintaining two sets of files in a table. The first set is in the base/delta directories, they contain the INSERTed rows. The second set of files are in the delete-delta directories, they contain the DELETEd rows. (UPDATE operations are implemented via DELETE+INSERT.) In the filesystem it looks like e.g.: * full_acid/delta_0000001_0000001_0000/0000_0 * full_acid/delta_0000002_0000002_0000/0000_0 * full_acid/delete_delta_0000003_0000003_0000/0000_0 During scanning we need to return INSERTed rows minus DELETEd rows. This patch implements it by creating an ANTI JOIN between the INSERT and DELETE sets. It is a planner-only modification. Every HDFS SCAN that scans full ACID tables (that also have deleted rows) are converted to two HDFS SCANs, one for the INSERT deltas, and one for the DELETE deltas. Then a LEFT ANTI HASH JOIN with BROADCAST distribution mode is created above them. Later we can add support for other distribution modes if the performance requires it. E.g. if we have too many deleted rows then probably we are better off with PARTITIONED distribution mode. We could estimate the number of deleted rows by sampling the delete delta files. The current patch only works for primitive types. I.e. we cannot select nested data if the table has deleted rows. Testing: * added planner test * added e2e tests Change-Id: I15c8feabf40be1658f3dd46883f5a1b2aa5d0659 Reviewed-on: http://gerrit.cloudera.org:8080/16082 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-07-14 12:53:51 +00:00
Tim Armstrong	574fef2a76	IMPALA-9917: grouping() and grouping_id() support Implements grouping() and grouping_id() builtins. grouping_id() has both a no-arg version, which returns a bit vector of all grouping exprs and a varargs version, which returns a bit vector of the provided arguments. Grouping is a keyword, so needs special handling in the parser to be accepted as a function name. These functions are implemented in the transpose agg with a CASE expression similar to other aggregate functions, but returning the grouping() or grouping_id() value for that aggregation class instead of an aggregated value. Testing: * Added parser test for grouping keyword. * Added analysis tests for the functions. * Added basic planner test to show expressions generated * Added some TPC-DS queries that use grouping() - queries 80, 70 and 86 using reference .test files from Fang-Yu Rao. 27 and 36 were added with reference results from https://github.com/cwida/tpcds-result-reproduction * Add targeted end-to-end tests. * Added view compatibility test with Hive. Change-Id: If0b1640d606256c0fe9204d2a21a8f6d06abcdb6 Reviewed-on: http://gerrit.cloudera.org:8080/16140 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Tim Armstrong <tarmstrong@cloudera.com>	2020-07-14 03:13:18 +00:00
Tim Armstrong	3e1e7da229	IMPALA-9898: generate grouping set plans Integrates the parsing and analysis with plan generation. Testing: * Add analysis test to make sure we reject unsupported queries. * Added targeted planner tests to ensure we generate the correct aggregation classes for a variety of cases. * Add targeted end-to-end functional tests. Added five TPC-DS queries that use ROLLUP, building on some work done by Fang-Yu Rao. Some tweaks were required for these tests. * Add an extra ORDER BY clause to q77 to make fully deterministic. * Add backticks around `returns` to avoid reserved word. * Add INTERVAL keyword to date/timestamp arithmetic. We can run q80, too, but I haven't added or verified results yet - that can be done in a follow-up. Change-Id: Ie454c5bf7aee266321dee615548d7f2b71380197 Reviewed-on: http://gerrit.cloudera.org:8080/16128 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Tim Armstrong <tarmstrong@cloudera.com>	2020-07-14 03:13:18 +00:00
Tim Armstrong	4e2498da6f	IMPALA-9949: fix SELECT list subqueries with HAVING/LIMIT The patch for IMPALA-8954 failed to account for subqueries that could produce < 1 row. SelectStmt.returnsSingleRow() is confusing because it actually returns true if it returns at most one row. As a fix I split it into returnsExactlyOneRow() and returnsAtMostOneRow(), then used returnsExactlyOneRow() to determine if the subquery should instead be rewritten into a LEFT OUTER JOIN, which produces the correct result. CROSS JOIN is still preferred because it can be more freely reordered during planning. Testing: * Added planner tests for a range of scenarios where it can be rewritten as a CROSS JOIN and where it needs to be a LEFT OUTER JOIN for correctness. * Added some targeted end-to-end tests where the results were previously incorrect. Checked the behaviour against Hive and postgres. Ran exhaustive tests. Change-Id: I6034aedac776783bdc8cdb3a2df344e2b3662da6 Reviewed-on: http://gerrit.cloudera.org:8080/16171 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-07-13 22:38:36 +00:00
Tim Armstrong	fea5dffec5	IMPALA-9924: handle single subquery in or predicate This patch supports a subset of cases of subqueries inside OR inside WHERE and HAVING clauses. The approach used is to rewrite the subquery into a many-to-one LEFT OUTER JOIN with the subquery and then replace the subquery in the expression with a reference to the single select list expressions of the subquery. This works because: * A many-to-one LEFT OUTER JOIN returns one output row for each left input row, meaning that for every row in the original query before the rewrite, we get the same row plus a single matched row from the subquery * Expressions can be rewritten to refer to a slotref from the right side of the LEFT OUTER JOIN without affecting semantics. E.g. an IN subquery becomes <slot> IS NOT NULL or <operator> (<subquery>) becomes <operator> <slot>. This does not affect SELECT list subqueries, which are rewritten using a different mechanism that can already support some subqueries in disjuncts. Correlated and uncorrelated subqueries are both supported, but various limitations are present. Limitations: * Only one subquery per predicate is supported. The rewriting approach should generalize to multiple subqueries but other code needs refactoring to handle this case. * EXISTS and NOT EXISTS subqueries are not supported. The rewriting approach can generalise to that, but we need to add or pick a select list item from the subquery to check for NULL/IS NOT NULL and a little more work is required to do that correctly. * NOT IN is not supported because of the special NULL semantics. * Subqueries with aggregates + grouping by are not supported because we rely on adding distinct to select list and we don't support distinct + aggregations because of IMPALA-5098. Tests: * Positive analysis tests for IN and binary predicate operators. * Negative analysis tests for unsupported subquery operators. * Negative analysis tests for multiple subqueries. * Negative analysis tests for runtime scalar subqueries. * Positive and negative analysis tests for aggregations in subquery. * TPC-DS Query 45 planner and query tests * Targeted planner tests for various supported queries. * Targeted functional tests to confirm plans are executable and return correct result. These exercise a mix of the supported features - correlated/correlated, aggregate functions, EXISTS/comparator, etc. * Tests for BETWEEN predicate, which is supported as a side-effect of being rewritten during analysis. Change-Id: I64588992901afd7cd885419a0b7f949b0b174976 Reviewed-on: http://gerrit.cloudera.org:8080/16152 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Zoltan Borok-Nagy <boroknagyz@cloudera.com>	2020-07-13 16:02:27 +00:00
wzhou-code	55099517b0	IMPALA-9294: Support DATE for min-max runtime filter Implemented Date min-max filter and applied it to Kudu as other min-max runtime filters. Added new test cases for Date min-max filters. Testing: Passed all core tests. Change-Id: Ic2f6e2dc6949735d5f0fcf317361cc2969a5e82c Reviewed-on: http://gerrit.cloudera.org:8080/16103 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-07-08 22:59:57 +00:00
Shant Hovsepian	2dca55695e	IMPALA-9784, IMPALA-9905: Uncorrelated subqueries in HAVING. Support rewriting subqueries in the HAVING clause by nesting the aggregation query and pulling up the subquery predicates into the outer WHERE clause. Testing: * New analyzer tests * New functional subquery tests * Added Q23, Q24 and Q44 to the tpcds workload * Ran subquery rewrite tests Change-Id: I124a58a09a1a47e1222a22d84b54fe7d07844461 Reviewed-on: http://gerrit.cloudera.org:8080/16052 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>	2020-07-05 22:03:42 +00:00
Shant Hovsepian	388ad555d7	IMPALA-8954: Uncorrelated scalar subqueries in the select list Extend StmtRewriter with the ability to rewrite scalar subqueries in the select list into cross joins. Currently the subquery must pass plan-time checks to determine that it returns a single row which may miss cases that may be valid at runtime or with more complex evaluation of the predicate expressions in the planner. Support for correlated subqueries will be a follow on change. Testing: * Added new analyzer tests, updated previous subquery tests * test_queries.py::TestQueries::test_subquery * Added test_tpcds_q9 to e2e and planner tests Change-Id: Ibcf55d26889aa01d69bb85f18c9241dda095fb66 Reviewed-on: http://gerrit.cloudera.org:8080/16007 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Tim Armstrong <tarmstrong@cloudera.com>	2020-07-05 22:03:42 +00:00
Aman Sinha	08a7569d4c	IMPALA-9911: Fix IS [NOT] NULL predicate selectivity When null count is 0, the IsNullPredicate's selectivity was not being computed since it did not distinguish between a -1 (no stats) vs a 0 null count. This caused a default selectivity estimate being applied. This patch fixes it by explicitly checking whether nulls count stat is present and if so, use it regardless of whether it is 0 or more. Testing: - Added cardinality tests for IS NULL and IS NOT NULL. - Ran PlannerTest and updated baseline plans. - Updated expected selectivity for null predicate tests in ExprCardinalityTest. - Ran precommit tests through gerrit-verify-dryrun Change-Id: I46c084be780b8f5aead9e2b9656fbab6cc8c8874 Reviewed-on: http://gerrit.cloudera.org:8080/16131 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-07-02 11:24:44 +00:00
wzhou-code	c7ce4fa109	IMPALA-9691: Support Kudu Timestamp and Date bloom filter Impala save timestamp as 12 bytes of structure TimestampValue with time in nano seconds. Kudu store timestamp as 8 bytes of Unix Time microseconds. To avoid the data truncation issue in the bloom filter, add FunctionCallExpr with 'utc_to_unix_micros' as the root of source expression of bloom filter to convert timestamp values to microseconds when building timestamp bloom filter for Kudu. Generated functional date_tbl table in Kudu format for unit-test. Added new test cases for Kudu Timestamp and Date bloom filters. Testing: Passed all core tests. Change-Id: I3c1e9bcc9fd6d79a39f25eaa3396188fc0a52a48 Reviewed-on: http://gerrit.cloudera.org:8080/16094 Reviewed-by: Thomas Tauber-Marshall <tmarshall@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-06-26 06:56:16 +00:00
Tim Armstrong	62729980d9	IMPALA-7020: fix costing of non-trivial CAST expressions Some cast operations are quite expensive to evaluate, which was not reflected in the uniform costing of CAST expresions. We fix this by increasing the cost of non-trivial casts to be the same as an arbitrary function call. Testing: Ran exhaustive tests. Add planner tests to check that CAST expressions are materialized or not based on the input and output types - the planner output lists 'materialized:' expressions for the SORT operator. A few existing planner tests had changes in predicate ordering. I checked manually that these changes made sense. Perf: I sanity-checked that this actually helped (a variant of) the example query from IMPALA-7020. The following query went from ~8s to ~2s in my dev environment: select * FROM ( SELECT o.*, ROW_NUMBER() OVER(ORDER BY evt_ts DESC) AS rn FROM ( SELECT l_orderkey,l_partkey,l_linenumber,l_quantity, cast (l_shipdate as date) evt_ts FROM tpch_parquet.lineitem ) o ) r WHERE rn BETWEEN 1 AND 101 ORDER BY rn; Change-Id: I3f1a16fc45191a2eedf38cc243c70173d44806c6 Reviewed-on: http://gerrit.cloudera.org:8080/16073 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-06-26 00:05:53 +00:00
Joe McDonnell	f15a311065	IMPALA-9709: Remove Impala-lzo from the development environment This removes Impala-lzo from the Impala development environment. Impala-lzo is not built as part of the Impala build. The LZO plugin is no longer loaded. LZO tables are not loaded during dataload, and LZO is no longer tested. This removes some obsolete scan APIs that were only used by Impala-lzo. With this commit, Impala-lzo would require code changes to build against Impala. The plugin infrastructure is not removed, and this leaves some LZO support code in place. If someone were to decide to revive Impala-lzo, they would still be able to load it as a plugin and get the same functionality as before. This plugin support may be removed later. Testing: - Dryrun of GVO - Modified TestPartitionMetadataUncompressedTextOnly's test_unsupported_text_compression() to add LZO case Change-Id: I3a4f12247d8872b7e14c9feb4b2c58cfd60d4c0e Reviewed-on: http://gerrit.cloudera.org:8080/15814 Reviewed-by: Bikramjeet Vig <bikramjeet.vig@cloudera.com> Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>	2020-06-15 23:42:12 +00:00
Tim Armstrong	bf5b20d17c	IMPALA-9814: fix mt_dop parallelism for analytic fns This replaces getNumNodes() with getNumInstances() in AnalyticPlanner, which fixes some cases of underparallelisation with analytic functions. Here is an example query which is underparallelised by only partitioning on a column, ss_store_sk, with NDV=6, when there are 3 backends with mt_dop=3. set mt_dop=3; explain select count() over (partition by ss_addr_sk, ss_store_sk), count() over (partition by ss_sold_date_sk, ss_store_sk) from tpcds_parquet.store_sales; Before: +---------------------------------------------------------------------------+ \| Explain String \| +---------------------------------------------------------------------------+ \| Max Per-Host Resource Reservation: Memory=61.50MB Threads=7 \| \| Per-Host Resource Estimates: Memory=148MB \| \| \| \| PLAN-ROOT SINK \| \| \| \| \| 06:EXCHANGE [UNPARTITIONED] \| \| \| \| \| 04:ANALYTIC \| \| \| functions: count() \| \| \| partition by: ss_sold_date_sk, ss_store_sk \| \| \| row-size=28B cardinality=2.88M \| \| \| \| \| 03:SORT \| \| \| order by: ss_sold_date_sk ASC NULLS FIRST, ss_store_sk ASC NULLS FIRST \| \| \| row-size=20B cardinality=2.88M \| \| \| \| \| 02:ANALYTIC \| \| \| functions: count() \| \| \| partition by: ss_addr_sk, ss_store_sk \| \| \| row-size=20B cardinality=2.88M \| \| \| \| \| 01:SORT \| \| \| order by: ss_addr_sk ASC NULLS FIRST, ss_store_sk ASC NULLS FIRST \| \| \| row-size=12B cardinality=2.88M \| \| \| \| \| 05:EXCHANGE [HASH(ss_store_sk)] \| \| \| \| \| 00:SCAN HDFS [tpcds_parquet.store_sales] \| \| HDFS partitions=1824/1824 files=1824 size=196.96MB \| \| row-size=12B cardinality=2.88M \| +---------------------------------------------------------------------------+ After, the two stages are partitioned by both analytic partition columns: +---------------------------------------------------------------------------+ \| Explain String \| +---------------------------------------------------------------------------+ \| Max Per-Host Resource Reservation: Memory=61.50MB Threads=10 \| \| Per-Host Resource Estimates: Memory=209MB \| \| \| \| PLAN-ROOT SINK \| \| \| \| \| 07:EXCHANGE [UNPARTITIONED] \| \| \| \| \| 04:ANALYTIC \| \| \| functions: count() \| \| \| partition by: ss_sold_date_sk, ss_store_sk \| \| \| row-size=28B cardinality=2.88M \| \| \| \| \| 03:SORT \| \| \| order by: ss_sold_date_sk ASC NULLS FIRST, ss_store_sk ASC NULLS FIRST \| \| \| row-size=20B cardinality=2.88M \| \| \| \| \| 06:EXCHANGE [HASH(ss_sold_date_sk,ss_store_sk)] \| \| \| \| \| 02:ANALYTIC \| \| \| functions: count() \| \| \| partition by: ss_addr_sk, ss_store_sk \| \| \| row-size=20B cardinality=2.88M \| \| \| \| \| 01:SORT \| \| \| order by: ss_addr_sk ASC NULLS FIRST, ss_store_sk ASC NULLS FIRST \| \| \| row-size=12B cardinality=2.88M \| \| \| \| \| 05:EXCHANGE [HASH(ss_addr_sk,ss_store_sk)] \| \| \| \| \| 00:SCAN HDFS [tpcds_parquet.store_sales] \| \| HDFS partitions=1824/1824 files=1824 size=196.96MB \| \| row-size=12B cardinality=2.88M \| +---------------------------------------------------------------------------+ Testing: Ran exhaustive tests. Change-Id: Ia88d9494c566b984c18f4b051c2d76f389078dd9 Reviewed-on: http://gerrit.cloudera.org:8080/16022 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-06-11 04:02:49 +00:00
wzhou-code	c62a6808fc	IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu Defined the BloomFilter class as the wrapper of kudu::BlockBloomFilter. impala::BloomFilter build runtime bloom filter in kudu::BlockBloomFilter APIs with FastHash as default hash algorithm. Removed the duplicated functions from impala::BloomFillter class. Pushed down bloom filter to Kudu through Kudu clinet API. Added a new query option ENABLED_RUNTIME_FILTER_TYPES to set enabled runtime filter types, which only affect Kudu scan node now. By default, bloom filter is not enabled, only min-max filter will be enabled for Kudu. With this option, user could enable bloom filter, min-max filter, or both bloom and min-max runtime filters. Added new test cases in PlannerTest and end-end runtime_filters test for pushing down bloom filter to Kudu. Added test cases to compare the number of rows returned from Kudu scan when appling different types of runtime filter on same queries. Updated bloom-filter-benchmark due to the bloom-filter implementation change. Bump Kudu version to d652cab17. Testing: - Passed all exhaustive tests. Performance benchmark: - Ran single_node_perf_run.py on TPC-H with scale as 30 for parquet and Kudu. Verified that new hash function and bloom-filter implementation don't cause regressions for HDFS bloom filters. For Kudu, there is one regression for query TPCH-Q9 and there are improvement for about 8 queris when appling both bloom and min-max filters. The bloom filter reduce the number of rows returned from Kudu scan, hence reduce the cost for aggregation and hash join. But bloom filter evaluation add extra cost for Kudu scan, which offset the gain on aggregation and join. Kudu scan need to be optimized for bloom filter in following tasks. - Ran bloom-filter microbenchmarks and verified that there is no regression for Insert/Find/Union functions with or without AVX2 due to bloom-filter implementation changes. There is small performance degradation for Init function, but this function is not in hot path. Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754 Reviewed-on: http://gerrit.cloudera.org:8080/15683 Reviewed-by: Thomas Tauber-Marshall <tmarshall@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-06-05 17:43:32 +00:00
Tim Armstrong	e5777f0eb8	IMPALA-8834: Short-circuit partition key scan This adds a new version of the pre-existing partition key scan optimization that always returns correct results, even when files have zero rows. This new version is always enabled by default. The old existing optimization, which does a metadata-only query, is still enabled behind the OPTIMIZE_PARTITION_KEY_SCANS query option. The new version of the optimization must scan the files to see if they are non-empty. Instead of using metadata only, the planner instructs the backend to short-circuit HDFS scans after a single row has been returned from each file. This gives results equivalent to returning all the rows from each file, because all rows in the file belong to the same partition and therefore have identical values for any columns that are partition key values. Planner cardinality estimates are adjusted accordingly to enable potentially better plans and other optimisations like disabling codegen. We make some effort to avoid generated extra scan ranges for remote scans by only generating one range per remote file. The backend optimisation is implemented by constructing a row batch with capacity for a single row only and then terminating each scan range once a single row has been produced. Both Parquet and ORC have optimized code paths for zero slot table scans that mean this will only result in a footer read. (Other file formats still need to read some portion of the file, but can terminate early once one row has been produced.) This should be quite efficient in practice with file handle caching and data caching enabled, because it then only requires reading the footer from the cache for each file. The partition key scan optimization is also slightly generalised to apply to scans of unpartitioned tables where no slots are materialized. A limitation of the optimization where it did not apply to multiple grouping classes was also fixed. Limitations: * This still scans every file in the partition. I.e. there is no short-circuiting if a row has already been found in the partition by the current scan node. * Resource reservations and estimates for the scan node do not all take into account this optimisation, so are conservative - they assume the whole file is scanned. Testing: * Added end-to-end tests that execute the query on all HDFS file formats and verify that the correct number of rows flow through the plan. * Added planner test based on the existing test partition key scan test. * Added test to make sure single node optimisation kicks in when expected. * Add test for cardinality estimates with and without stats * Added test for unpartitioned tables. * Added planner test that checks that optimisation is enabled for multiple aggregation classes. * Added a targeted perf test. Change-Id: I26c87525a4f75ffeb654267b89948653b2e1ff8c Reviewed-on: http://gerrit.cloudera.org:8080/13993 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-05-20 23:03:23 +00:00
Shant Hovsepian	7d260b6028	IMPALA-9727: Fix HBaseScanNode explain formatting In the case with more than one hbase predicate the indentation level wasn't correctly formatted in the explain string. Instead of: \| \| 13:SCAN HBASE [default.dimension d] \| \| hbase filters: \| \| d:foo EQUAL '1' \| \| d:bar EQUAL '2' \| \| d:baz EQUAL '3' \| \| predicate: This was produced: \| \| 13:SCAN HBASE [default.dimension d] \| \| hbase filters: d:foo EQUAL '1' d:bar EQUAL '2' d:baz EQUAL '3' \| \| predicate: Change-Id: I30fad791408a1f7e35e9b3f2e6cb4958952dd567 Reviewed-on: http://gerrit.cloudera.org:8080/15749 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-05-12 06:37:37 +00:00
Tim Armstrong	dcf497910a	IMPALA-9736: fix mt_dop not supported error The error was not accurate, because joins are now supported. Also updated it to refer to DML statements instead of table sinks to be more user-appropriate. Change-Id: I8eb8106f86c47a14cc951c4a77966fe51b5c30e3 Reviewed-on: http://gerrit.cloudera.org:8080/15884 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-05-10 00:18:43 +00:00
Aman Sinha	d0325b2ac1	IMPALA-9539: Enable CNF rewrites by default This patch enables the conjunctive normal form rewrites by default by setting enable_cnf_rewrites to true. Since the CNF rule does an explicit analyze of the predicate if it was not previously analyzed, in case no rewrite was done we were previously returning the analyzed predicate. This causes some side effects hence I have fixed it by returning the original un-analyzed predicate when no rewrite is done. Other functional and performance testing with this flag set to true did not uncover major regressions and showed significant performance gains for queries with disjunctions in the tpch and tpcds suites. Testing: - Updated the PlannerTest tests with plan changes in various test suites. Removed previously added tpch tests which were explicitly setting this flag to true. - I had previously added a test in convert-to-cnf.test with enable_cnf_rewrites=false, so I did not add any new tests with this flag disabled. Change-Id: I4dde86e092c61d71ddf9081f768072ced470b589 Reviewed-on: http://gerrit.cloudera.org:8080/15807 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-05-01 04:47:31 +00:00
Aman Sinha	62080883fe	IMPALA-9693: Analyze predicate in CNF rule if not previously done The OrderByElement's expr that is used during the rewrite phase was not analyzed, which causes an INVALID_TYPE assert when the CNF rule tries to process the predicate within the ORDER BY. This patch fixes the problem by doing an explicit analyze of the compound predicate in the CNF rule. This is a conservative approach such that it can detect other such un-analyzed predicates that may be passed in from any other clauses. An alternate attempt at trying to replace the OrderByElement's expr with an analyzed version works for this scenario but causes test failures in ExprRewriterTest, so instead I have opted for this approach. Testing: - Added tests with compound predicate in the ORDER BY either in the main query block or within analytic function. - Ran 'mvn test' for FE. Change-Id: Iff71871bd69a068f4b5807161cffa7a49d76226d Reviewed-on: http://gerrit.cloudera.org:8080/15815 Reviewed-by: Quanlong Huang <huangquanlong@gmail.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-04-27 17:37:31 +00:00
Tim Armstrong	b2d9901fb8	IMPALA-9176: shared null-aware anti-join build This switches null-aware anti-join (NAAJ) to use shared join builds with mt_dop > 0. To support this, we make all access to the join build data structures from the probe read-only. NAAJ requires iterating over rows from build partitions at various steps in the algorithm and before this patch this was not thread-safe. We avoided that problem by having a separate builder for each join node and duplicating the data. The main challenge was iteration over null_aware_partition()->build_rows() from the probe side, because it uses an embedded iterator in the stream so was not thread-safe (since each thread would be trying to use the same iterator). The solution is to extend BufferedTupleStream to allow multiple read iterators into a pinned, read-only, stream. Each probe thread can then iterate over the stream independently with no thread safety issues. With BufferedTupleStream changes, I partially abstracted ReadIterator more from the rest of BufferedTupleStream, but decided not to completely refactor so that this patchset didn't cause excessive churn. I.e. much BufferedTupleStream code still accesses internal fields of ReadIterator. Fix a pre-existing bug in grouping-aggregator where Spill() hit a DCHECK because the hash table was destroyed unnecessarily when it hit an OOM. This was flushed out by the parameter change in test_spilling. Testing: Add test to buffered-tuple-stream-test for multiple readers to BTS. Tweaked test_spilling_naaj_no_deny_reservation to have a smaller minimum reservation, required to keep the test passing with the new, lower, memory requirement. Updated a TPC-H planner test where resource requirements slightly decreased for the NAAJ. Ran the naaj tests in test_spilling.py with TSAN enabled, confirmed no data races. Ran exhaustive tests, which passed after fixing IMPALA-9611. Ran core tests with ASAN. Ran backend tests with TSAN. Perf: I ran this query that exercises EvaluateNullProbe() heavily. select l_orderkey, l_partkey, l_suppkey, l_linenumber from tpch30_parquet.lineitem where l_suppkey = 4162 and l_shipmode = 'AIR' and l_returnflag = 'A' and l_shipdate > '1993-01-01' and if(l_orderkey > 5500000, NULL, l_orderkey) not in ( select if(o_orderkey % 2 = 0, NULL, o_orderkey + 1) from orders where l_orderkey = o_orderkey) order by 1,2,3,4; It went from ~13s to ~11s running on a single impalad with this change, because of the inlining of CreateOutputRow() and EvalConjuncts(). I also ran TPC-H SF 30 on Parquet with mt_dop=4, and there was no change in performance. Change-Id: I95ead761430b0aa59a4fb2e7848e47d1bf73c1c9 Reviewed-on: http://gerrit.cloudera.org:8080/15612 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-04-24 20:56:58 +00:00
Aman Sinha	293dc2ec92	IMPALA-9620: Ensure group-by and cnf exprs are analyzed This change initializes the SelectStmt's groupingExprs_ with the analyzed version. It also analyzes the new predicates created by the Conjunctive Normal Form rewrite rule such that potential consumers of this rewrite don't encounter problems. Before this change, the SelectStmt.analyzeGroupingExprs() made a deep copy of the original grouping exprs, then analyzed the copy but left the original intact. This causes problems because a rewrite rule (invoked by SelectStmt.rewriteExprs()) may try to process the original grouping exprs and encounter INVALID_TYPE (types are only assigned after analyze). This was the root cause of the problem described in the JIRA. Although this was a pre- existing behavior, it gets exposed when enable_cnf_rewrites=true. Note that the deep-copied analyzed grouping exprs are supplied to MultiAggregateInfo and since many operations are using this data structure, we don't see widespread issues. This patch fixes it and as a conservative measure, does the analyze of new predicates in the CNF rule. (note: there are likely other rewrite rules where explicit analyze should be done but that is outside the scope for this issue). Testing: - Added new unit tests with predicates in SELECT and GROUP BY - Ran 'mvn test' for the FE Change-Id: I6da4a17c6e648f466ce118c4646520ff68f9878e Reviewed-on: http://gerrit.cloudera.org:8080/15693 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-04-09 17:03:59 +00:00
Kurt Deschler	3bd5c4aaa2	IMPALA-9429: Unioned partition columns break partition pruning In the case of a union query where predicates are pushed into the union, predicate Exprs can contain SlotReft that are transformed into constants after analysis and eligible for constant folding. During partition pruning there is a check that eligible constant folding has already occurred which was failing and reporting IllegalStateException since the surrounding code only handles specific cases. This fix adds FoldConstantsRule to the rewriter used by HdfsPartitionPruner.prunePartitions so that pruning predicates have any constants folded as expected. Testing: Testcases added to PlannerTest/union.test based on provided repo using alltypes tables. Change-Id: I1c1384c2cd1ad5f7024449196f9a348ecdccb60b Reviewed-on: http://gerrit.cloudera.org:8080/15371 Reviewed-by: Quanlong Huang <huangquanlong@gmail.com> Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-04-02 19:42:57 +00:00
Zoltan Borok-Nagy	8aa0652871	IMPALA-9484: Full ACID Milestone 1: properly scan files that has full ACID schema Full ACID row format looks like this: { "operation": 0, "originalTransaction": 1, "bucket": 536870912, "rowId": 0, "currentTransaction": 1, "row": {"i": 1} } User columns are nested under "row". In the frontend we need to create slot descriptors that correspond to the file schema. In the catalog we could mimic the file schema but that would introduce several complexities and corner cases in column resolution. Also in query results the heading of the above user column would be "row.i". Star expansion should also be modified, etc. Because of that in the Catalog I create the exact opposite of the above schema: { "row__id": { "operation": 0, "originalTransaction": 1, "bucket": 536870912, "rowId": 0, "currentTransaction": 1 } "i": 1 } This way very little modification is needed in the frontend. And the hidden columns can be easily retrieved via 'SELECT row__id.' when we need those for debugging/testing. We only need to change Path.getAbsolutePath() to return a schema path that corresponds to the file schema. Also in the backend we need some extra juggling in OrcSchemaResolver::ResolveColumn() to retrieve the table schema path from the file schema path. Testing: I changed data loading to load ORC files in full ACID format by default. With this change we should be able to scan full ACID tables that are not minor-compacted, don't have deleted rows, and don't have original files. Newly added Tests: specific queries about hidden columns (full-acid-rowid.test) * SHOW CREATE TABLE (show-create-table-full-acid.test) * DESCRIBE [FORMATTED] TABLE (describe-path.test) * INSERT should be forbidden (acid-negative.test) * added tests for column masking ( ranger_column_masking_complex_types.test) Change-Id: Ic2e2afec00c9a5cf87f1d61b5fe52b0085844bcb Reviewed-on: http://gerrit.cloudera.org:8080/15395 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-04-02 12:01:41 +00:00
Tim Armstrong	ab7e209d1b	IMPALA-9099: allow mt_dop for joins without feature flag This allows running any read-only query with mt_dop > 0. Before this patch, no joins were allowed with mt_dop > 0. Previous patches, particularly IMPALA-9156, added significantly more code coverage for multithreading+joins. It should be safe to allow enabling on a query-by-query basis. Many improvements are still planned - see IMPALA-3902. So behaviour and performance characteristics of mt_dop > 0 with more complex plans and joins will continue to change. Testing: Updated the mt_dop validation tests and remove redundant planner test that doesn't provide much additional coverage of the validation support. Ran exhaustive tests. Change-Id: I9c6566abb239db0e775f2beaa25a62c36313cd6f Reviewed-on: http://gerrit.cloudera.org:8080/15545 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-03-31 20:45:59 +00:00
Aman Sinha	1411ca6a00	IMPALA-9183: Convert disjunctive predicates to conjunctive normal form Added an expression rewrite rule to convert a disjunctive predicate to conjunctive normal form (CNF). Converting to CNF enables multi-table predicates that were only evaluated by a Join operator to be converted into either single-table conjuncts that are eligible for predicate pushdown to the scan operator or other multi-table conjuncts that are eligible to be pushed to a Join below. This helps improve performance for such queries. Since converting to CNF expands the number of expressions, we place a limit on the maximum number of CNF exprs (each AND is counted as 1 CNF expr) that are considered. Once the MAX_CNF_EXPRS limit (default is unlimited) is exceeded, whatever expression was supplied to the rule is returned without further transformation. A setting of -1 or 0 allows unlimited number of CNF exprs to be created upto int32 max. Another option ENABLE_CNF_REWRITES enables or disables the entire rewrite. This is False by default until we have done more thorough functional testing (tracking JIRA IMPALA-9539). Examples of rewrites: original: (a AND b) OR c rewritten: (a OR c) AND (b OR c) original: (a AND b) OR (c AND d) rewritten: (a OR c) AND (a OR d) AND (b OR c) AND (b OR d) original: NOT(a OR b) rewritten: NOT(a) AND NOT(b) Testing: - Added new unit tests with variations of disjunctive predicates and verified their Explain plans - Manually tested the result correctness on impala shell by running these queries with ENABLE_CNF_REWRITES enabled and disabled - Added TPC-H q7, q19 and TPC-DS q13 with the CNF rewrite enabled - Preliminary performance testing of TPC-DS q13 on a 10TB scale factor shows almost 5x improvement: Original baseline: 47.5 sec With this patch and CNF rewrite enabled: 9.4 sec Change-Id: I5a03cd7239333aaf375416ef5f2b7608fcd4a072 Reviewed-on: http://gerrit.cloudera.org:8080/15462 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-03-24 09:18:32 +00:00
Kurt Deschler	b865de7d97	IMPALA-8533: Impala daemon crash on sort This crash was caused by an empty sort tuple descriptor that was generated as a result of union substitutions replacing all sort fields with literals that were subsequently removed from the ordering spec. There was no check in place to prevent the empty tuple descriptor from being sent to impalad where it caused a divide-by-zero crash. Fix: This fix avoids inserting a sort node when there are no fields remaining to sort on. Also added a precondition to the SortNode that will prevent similar issues from crashing impalad. Testing: Testcases added to PlannerTest/union.test Change-Id: If19303fbf55927c1e1b76b9b22ab354322b21c54 Reviewed-on: http://gerrit.cloudera.org:8080/15473 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-03-24 04:26:32 +00:00
xqhe	9cb6dabe10	IMPALA-8361: Propagate predicates of outer-joined InlineView This is an improvement that tries to propagate predicates of the nullable side of the outer join into inline view. For example: SELECT * FROM functional.alltypessmall a LEFT JOIN ( SELECT id, upper(string_col) AS upper_val, length(string_col) AS len FROM functional.alltypestiny ) b ON a.id = b.id WHERE b.upper_val is NULL and b.len = 0 Before this change, the predicate b.len=0 can't be migrated into inline view since that is on the nullable side of an outer join if the predicate evaluates in the inline view nulls will not be rejected. However, we can be more aggressive. In particular, some predicates that must be evaluted at a join node can also be safely evaluted by the outer-joined inline view. Such predicates are not marked as assigned. The predicates propagate into the inline view and also be evaluated at a join node. We can divide predicates into two types. One that satisfies the condition that same as Analyzer#canEvalPredicate can be migrated into inline view， and one that satisfies the below three conditions is safe to be propagated into the nullable side of an outer join. 1) The predicate needs to be bound by tupleIds. 2) The predicate is not on-clause. 3) The predicate evaluates to false when all its referenced tuples are NULL. Therefore, 'b.upper_val is NULL' cannot be propagated to inline view but ‘b.len = 0’ can be propagated to inline view. Tests: * Add plan tests in inline-view.test * One baseline plan in inline-view.test, one in nested-collections.test and two in predicate-propagation.test had to be updated * Ran the full set of verifications in Impala Public Jenkins Change-Id: I6c23a45aeb5dd1aa06a95c9aa8628ecbe37ef2c1 Reviewed-on: http://gerrit.cloudera.org:8080/15047 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-03-19 07:17:51 +00:00
Tim Armstrong	ca53f68525	IMPALA-9530: query option to limit preagg memory This adds an advanced PREAGG_BYTES_LIMIT query option that allows limiting the memory consumption of streaming preaggregation operators in a query. It works by setting a maximum reservation on each grouping aggregator in a preaggregation node. The aggregators switch to passthrough mode automatically when hitting this limit, the same as if they were hitting the query memory limit. This does not override the minimum reservation computed for the aggregation - if the limit is less than the minimum reservation, the minimum reservation is used as the limit instead. The default behaviour is unchanged. Testing: Add a planner test with estimates higher and lower than limit to ensure that resource estimates correctly reflect the option. Add an end-to-end test that verifies that the option forces passthrough when the memory limit is hit. Change-Id: I87f7a5c68da93d068e304ef01afbcbb0d56807d9 Reviewed-on: http://gerrit.cloudera.org:8080/15463 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-03-19 06:03:36 +00:00
Tim Armstrong	08acccf9eb	IMPALA-9156: share broadcast join builds The scheduler will only create one join build finstance per backend in cases where this is supported. The builder is aware of the number of finstances executing the probe and hands off the build data structures to the builders. Nested loop join requires minimal modifications because the build data structures are read-only after initial construction. The only significant change is that memory can't be transferred to the multiple consumers, so MarkNeedsDeepCopy() needs to be used instead. Hash join requires additional synchronisation because the spilling algorithm mutates build-side data structures. This patch adds synchronisation so that rebuilding spilled partitions is done in a thread-safe manner, using a single thread. This uses the CyclicBarrier added in an earlier patch. Threads blocked on CyclicBarrier need to be cancellable, which is handled by cancelling the barrier when cancelling fragments on the backend. BufferPool now correctly handles multiple threads calling CleanPages() concurrently, which makes other methods thread-safe. Update planner to cost broadcast join and estimate memory consumption based on a single instance per node. Planner estimates of number of instances are improved. Instead of assuming mt_dop instances per node, use the total number of input splits (also called scan ranges in places) as an upper bound on the number of instances generated by scans. These instance estimates from the scan nodes are then propagated up the plan tree in the same way as the numNodes estimates. The instance estimate for the join build fragment is fixed to be based on the destination fragment. The profile now correctly accounts for time waiting for the builder, counting it in inactive time and showing it in the node timeline. Additional improvements/cleanup to the time accounting are deferring until IMPALA-9422. Testing: * Updated planner tests * Ran a single node stress test with TPC-H and TPC-DS * Add a targeted test for spilling broadcast joins, both repartitioning and not repartitioning. * Add a targeted test for a spilling broadcast join with empty probe * Add a targeted test for spilling broadcast join with empty build partitions. * Add a broadcast join to test_cancellation and test_failpoints. Perf: I did a single node run on my desktop: +----------+-----------------------+---------+------------+------------+----------------+ \| Workload \| File Format \| Avg (s) \| Delta(Avg) \| GeoMean(s) \| Delta(GeoMean) \| +----------+-----------------------+---------+------------+------------+----------------+ \| TPCH(30) \| parquet / none / none \| 6.26 \| -15.70% \| 4.63 \| -16.16% \| +----------+-----------------------+---------+------------+------------+----------------+ +----------+----------+-----------------------+--------+-------------+------------+-----------+----------------+-------+----------------+---------+---------+ \| Workload \| Query \| File Format \| Avg(s) \| Base Avg(s) \| Delta(Avg) \| StdDev(%) \| Base StdDev(%) \| Iters \| Median Diff(%) \| MW Zval \| Tval \| +----------+----------+-----------------------+--------+-------------+------------+-----------+----------------+-------+----------------+---------+---------+ \| TPCH(30) \| TPCH-Q21 \| parquet / none / none \| 24.97 \| 23.25 \| R +7.38% \| 0.51% \| 0.22% \| 5 \| R +6.95% \| 2.31 \| 27.93 \| \| TPCH(30) \| TPCH-Q4 \| parquet / none / none \| 2.83 \| 2.79 \| +1.31% \| 1.86% \| 0.36% \| 5 \| +1.88% \| 1.15 \| 1.53 \| \| TPCH(30) \| TPCH-Q6 \| parquet / none / none \| 1.28 \| 1.28 \| -0.01% \| 1.64% \| 1.63% \| 5 \| -0.11% \| -0.58 \| -0.01 \| \| TPCH(30) \| TPCH-Q22 \| parquet / none / none \| 2.65 \| 2.68 \| -0.94% \| 0.84% \| 1.46% \| 5 \| -0.21% \| -0.87 \| -1.25 \| \| TPCH(30) \| TPCH-Q1 \| parquet / none / none \| 4.69 \| 4.72 \| -0.56% \| 1.29% \| 0.52% \| 5 \| -1.04% \| -1.15 \| -0.89 \| \| TPCH(30) \| TPCH-Q13 \| parquet / none / none \| 10.64 \| 10.80 \| -1.48% \| 0.61% \| 0.60% \| 5 \| -1.39% \| -1.73 \| -3.91 \| \| TPCH(30) \| TPCH-Q15 \| parquet / none / none \| 4.11 \| 4.32 \| -4.92% \| 0.05% \| 0.40% \| 5 \| -4.93% \| -2.31 \| -27.46 \| \| TPCH(30) \| TPCH-Q20 \| parquet / none / none \| 3.47 \| 3.67 \| I -5.41% \| 0.81% \| 0.03% \| 5 \| I -5.70% \| -2.31 \| -15.75 \| \| TPCH(30) \| TPCH-Q17 \| parquet / none / none \| 7.58 \| 8.14 \| I -6.93% \| 3.13% \| 2.62% \| 5 \| I -9.31% \| -2.02 \| -3.96 \| \| TPCH(30) \| TPCH-Q9 \| parquet / none / none \| 15.59 \| 17.02 \| I -8.38% \| 0.95% \| 0.43% \| 5 \| I -8.92% \| -2.31 \| -19.37 \| \| TPCH(30) \| TPCH-Q14 \| parquet / none / none \| 2.90 \| 3.25 \| I -10.93% \| 1.42% \| 4.41% \| 5 \| I -10.28% \| -2.31 \| -5.33 \| \| TPCH(30) \| TPCH-Q12 \| parquet / none / none \| 2.69 \| 3.13 \| I -14.31% \| 4.50% \| 1.40% \| 5 \| I -17.79% \| -2.31 \| -7.80 \| \| TPCH(30) \| TPCH-Q16 \| parquet / none / none \| 2.50 \| 3.03 \| I -17.54% \| 0.10% \| 0.79% \| 5 \| I -20.55% \| -2.31 \| -49.31 \| \| TPCH(30) \| TPCH-Q10 \| parquet / none / none \| 4.76 \| 5.92 \| I -19.52% \| 0.78% \| 0.33% \| 5 \| I -24.31% \| -2.31 \| -61.63 \| \| TPCH(30) \| TPCH-Q2 \| parquet / none / none \| 2.56 \| 3.33 \| I -23.18% \| 2.13% \| 0.85% \| 5 \| I -30.39% \| -2.31 \| -28.14 \| \| TPCH(30) \| TPCH-Q18 \| parquet / none / none \| 12.59 \| 16.41 \| I -23.26% \| 1.73% \| 0.90% \| 5 \| I -30.43% \| -2.31 \| -32.36 \| \| TPCH(30) \| TPCH-Q11 \| parquet / none / none \| 1.83 \| 2.41 \| I -24.04% \| 1.83% \| 2.22% \| 5 \| I -30.48% \| -2.31 \| -20.54 \| \| TPCH(30) \| TPCH-Q8 \| parquet / none / none \| 4.43 \| 5.94 \| I -25.33% \| 0.96% \| 0.54% \| 5 \| I -34.54% \| -2.31 \| -63.01 \| \| TPCH(30) \| TPCH-Q5 \| parquet / none / none \| 3.81 \| 5.37 \| I -29.08% \| 1.43% \| 0.69% \| 5 \| I -41.47% \| -2.31 \| -53.11 \| \| TPCH(30) \| TPCH-Q7 \| parquet / none / none \| 13.34 \| 21.49 \| I -37.92% \| 0.46% \| 0.30% \| 5 \| I -60.69% \| -2.31 \| -203.08 \| \| TPCH(30) \| TPCH-Q3 \| parquet / none / none \| 4.73 \| 7.73 \| I -38.81% \| 4.90% \| 1.35% \| 5 \| I -61.68% \| -2.31 \| -26.40 \| \| TPCH(30) \| TPCH-Q19 \| parquet / none / none \| 3.71 \| 6.61 \| I -43.83% \| 1.63% \| 0.09% \| 5 \| I -77.12% \| -2.31 \| -106.61 \| +----------+----------+-----------------------+--------+-------------+------------+-----------+----------------+-------+----------------+---------+---------+ Change-Id: I4c67e4b2c87ed0fba648f1e1710addb885d66dc7 Reviewed-on: http://gerrit.cloudera.org:8080/15096 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-03-17 23:29:45 +00:00
stiga-huang	9672d94596	IMPALA-7784: Use unescaped string in partition pruning + fix duplicatedly unescaping strings String values from external systems (HDFS, Hive, Kudu, etc.) are already unescaped, the same as string values in Thrift objects deserialized in coordinators. We should mark needsUnescaping_ as false in creating StringLiterals for these values (in LiteralExpr#create()). When comparing StringLiterals in partition pruning, we should also use the unescaped values if needsUnescaping_ is true. Tests: - Add tests for partition pruning on unescaped strings. - Add test coverage for all existing code paths using LiteralExpr#create(). - Run core tests Change-Id: Iea8070f16a74f9aeade294504f2834abb8b3b38f Reviewed-on: http://gerrit.cloudera.org:8080/15278 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-03-09 06:29:35 +00:00
wzhou-code	3fb376ba54	IMPALA-6689: Speed up point lookup for Kudu primary key If all primary key columns of the Kudu table are in equivalence predicates pushed down to Kudu, Kudu will return at most one row. In this case, we can adjust the cardinality estimation to speed up point lookup. This patch sets the input and output cardinality as 1 if the number of primary key columns in equivalence predicates pushed down to Kudu equals the total number of primary key columns of the Kudu table, hence enable small query optimization. Testing: - Added test cases in following PlannerTest: small-query-opt.test, disable-codegen.test and kudu.test. - Passed all FE tests, including new test cases. Change-Id: I4631cd4d1a528a1152b5cdcb268426f2ba1a0c08 Reviewed-on: http://gerrit.cloudera.org:8080/15250 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-02-28 02:03:34 +00:00
Joe McDonnell	90ab610d34	Convert dataload hdfs copy commands to LOAD DATA statements The schema file allows specifying a commandline command in several of the sections (LOAD, DEPENDENT_LOAD, etc). These are execute by testdata/bin/generate-schema-statements.py when it is creating the SQL files that are later executed for dataload. A fair number of tables use this flexibility to execute hdfs mkdir and copy commands via the command line. Unfortunately, this is very inefficient. HDFS command line commands require spinning up a JVM and can take over one second per command. These commands are executed during a serial part of dataload, and they can be executed multiple times. In short, these commands are a significant slowdown for loading the functional tables. This converts the hdfs command line statements to equivalent Hive LOAD DATA LOCAL statements. These are doing the copy from an already running JVM, so they do not need JVM startup. They also run in the parallel part of dataload, speeding up the SQL generation part. This speeds up generate-schema-statements.py significantly. On the functional dataset, it saves 7 minutes. Before: time testdata/bin/generate-schema-statements.py -w functional-query -e exhaustive -f real 8m8.068s user 10m11.218s sys 0m44.932s After: time testdata/bin/generate-schema-statements.py -w functional-query -e exhaustive -f real 0m35.800s user 0m42.536s sys 0m5.210s This is currently a long-pole in dataload, so it translates directly to an overall speedup of about 7 minutes. Testing: - Ran debug tests Change-Id: Icf17b85ff85618933716a80f1ccd6701b07f464c Reviewed-on: http://gerrit.cloudera.org:8080/15228 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-02-24 21:22:18 +00:00
Tim Armstrong	0bb056e525	IMPALA-4224: execute separate join builds fragments This enables parallel plans with the join build in a separate fragment and fixes all of the ensuing fallout. After this change, mt_dop plans with joins have separate build fragments. There is still a 1:1 relationship between join nodes and builders, so the builders are only accessed by the join node's thread after it is handed off. This lets us defer the work required to make PhjBuilder and NljBuilder safe to be shared between nodes. Planner changes: * Combined the parallel and distributed planning code paths. * Misc fixes to generate reasonable thrift structures in the query exec requests, i.e. containing the right nodes. * Fixes to resource calculations for the separate build plans. Calculate separate join/build resource consumption. Simplified the resource estimation by calculating resource consumption for each fragment separately, and assuming that all fragments hit their peak resource consumption at the same time. IMPALA-9255 is the follow-on to make the resource estimation more accurate. Scheduler changes: * Various fixes to handle multiple TPlanExecInfos correctly, which are generated by the planner for the different cohorts. * Add logic to colocate build fragments with parent fragments. Runtime filter changes: * Build sinks now produce runtime filters, which required planner and coordinator fixes to handle. DataSink changes: * Close the input plan tree before calling FlushFinal() to release resources. This depends on Send() not holding onto references to input batches, which was true except for NljBuilder. This invariant is documented. Join builder changes: * Add a common base class for PhjBuilder and NljBuilder with functions to handle synchronisation with the join node. * Close plan tree earlier in FragmentInstanceState::Exec() so that peak resource requirements are lower. * The NLJ always copies input batches, so that it can close its input tree. JoinNode changes: * Join node blocks waiting for build-side to be ready, then eventually signals that it's done, allowing the builder to be cleaned up. * NLJ and PHJ nodes handle both the integrated builder and the external builder. There is a 1:1 relationship between the node and the builder, so we don't deal with thread safety yet. * Buffer reservations are transferred between the builder and join node when running with the separate builder. This is not really necessary right now, since it is all single-threaded, but will be important for the shared broadcast. - The builder transfers memory for probe buffers to the join node at the end of each build phase. - At end of each probe phase, reservation needs to be handed back to builder (or released). ExecSummary changes: * The summary logic was modified to handle connecting fragments via join builds. The logic is an extension of what was used for exchanges. Testing: * Enable --unlock_mt_dop for end-to-end tests * Migrate some tests to run as part of end-to-end tests instead of custom cluster. * Add mt_dop dimension to various end-to-end tests to provide coverage of join queries, spill-to-disk and cancellation. * Ran a single node TPC-H and TPC-DS stress test with mt_dop=0 and mt_dop=4. Perf: * Ran TPC-H scale factor 30 locally with mt_dop=0. No significant change. Change-Id: I4403c8e62d9c13854e7830602ee613f8efc80c58 Reviewed-on: http://gerrit.cloudera.org:8080/14859 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-02-20 01:51:54 +00:00
Tim Armstrong	f38da0df8e	IMPALA-4400: aggregate runtime filters locally Move RuntimeFilterBank to QueryState(). Implement fine-grained locking for each filter to mitigate any increased lock contention from the change. Make RuntimeFilterBank handle multiple producers of the same filter, e.g. multiple instances of a partitioned join. It computes the expected number of filters upfront then sends the filter to the coordinator once all the local instances have been merged together. The merging can be done in parallel locally to improve latency of filter propagation. Add Or() methods to MinMaxFilter and BloomFilter, since we now need to merge those, not just the thrift versions. Update coordinator filter routing to expect only one instance of a filter from each producer backend and to only send one instance to each consumer backend (instead of sending one per fragment). Update memory reservations and estimates to be lower to account for sharing of filters between fragment instances. mt_dop plans are modified to show these shared and non-shared resources separately. Enable waiting for runtime filters for kudu scanner with mt_dop. Made min/max filters const-correct. Testing * Added unit tests for Or() methods. * Added some additional e2e test coverage for mt_dop queries * Updated planner tests with new estimates and reservation. * Ran a single node 3-impalad stress test with TPC-H kudu and TPC-DS parquet. * Ran exhaustive tests. * Ran core tests with ASAN. Perf * Did a single-node perf run on TPC-H with default settings. No perf change. * Single-node perf run with mt_dop=8 showed significant speedups: +----------+-----------------------+---------+------------+------------+----------------+ \| Workload \| File Format \| Avg (s) \| Delta(Avg) \| GeoMean(s) \| Delta(GeoMean) \| +----------+-----------------------+---------+------------+------------+----------------+ \| TPCH(30) \| parquet / none / none \| 10.14 \| -7.29% \| 5.05 \| -11.68% \| +----------+-----------------------+---------+------------+------------+----------------+ +----------+----------+-----------------------+--------+-------------+------------+-----------+----------------+-------+----------------+---------+---------+ \| Workload \| Query \| File Format \| Avg(s) \| Base Avg(s) \| Delta(Avg) \| StdDev(%) \| Base StdDev(%) \| Iters \| Median Diff(%) \| MW Zval \| Tval \| +----------+----------+-----------------------+--------+-------------+------------+-----------+----------------+-------+----------------+---------+---------+ \| TPCH(30) \| TPCH-Q7 \| parquet / none / none \| 38.87 \| 38.44 \| +1.13% \| 7.17% \| * 10.92% * \| 20 \| +0.72% \| 0.72 \| 0.39 \| \| TPCH(30) \| TPCH-Q1 \| parquet / none / none \| 4.28 \| 4.26 \| +0.50% \| 1.92% \| 1.09% \| 20 \| +0.03% \| 0.31 \| 1.01 \| \| TPCH(30) \| TPCH-Q22 \| parquet / none / none \| 2.32 \| 2.32 \| +0.05% \| 2.01% \| 1.89% \| 20 \| -0.03% \| -0.36 \| 0.08 \| \| TPCH(30) \| TPCH-Q15 \| parquet / none / none \| 3.73 \| 3.75 \| -0.42% \| 0.84% \| 1.05% \| 20 \| -0.25% \| -0.77 \| -1.40 \| \| TPCH(30) \| TPCH-Q13 \| parquet / none / none \| 9.80 \| 9.83 \| -0.38% \| 0.51% \| 0.80% \| 20 \| -0.32% \| -1.30 \| -1.81 \| \| TPCH(30) \| TPCH-Q2 \| parquet / none / none \| 1.98 \| 2.00 \| -1.32% \| 1.74% \| 2.81% \| 20 \| -0.64% \| -1.71 \| -1.79 \| \| TPCH(30) \| TPCH-Q6 \| parquet / none / none \| 1.22 \| 1.25 \| -2.14% \| 2.66% \| 4.15% \| 20 \| -0.96% \| -2.00 \| -1.95 \| \| TPCH(30) \| TPCH-Q19 \| parquet / none / none \| 5.13 \| 5.22 \| -1.65% \| 1.20% \| 1.40% \| 20 \| -1.76% \| -3.34 \| -4.02 \| \| TPCH(30) \| TPCH-Q16 \| parquet / none / none \| 2.46 \| 2.56 \| -4.13% \| 2.49% \| 1.99% \| 20 \| -4.31% \| -4.04 \| -5.94 \| \| TPCH(30) \| TPCH-Q9 \| parquet / none / none \| 81.63 \| 85.07 \| -4.05% \| 4.94% \| 3.06% \| 20 \| -5.46% \| -3.28 \| -3.21 \| \| TPCH(30) \| TPCH-Q10 \| parquet / none / none \| 5.07 \| 5.50 \| I -7.92% \| 0.96% \| 1.33% \| 20 \| I -8.51% \| -5.27 \| -22.14 \| \| TPCH(30) \| TPCH-Q21 \| parquet / none / none \| 24.00 \| 26.24 \| I -8.57% \| 0.46% \| 0.38% \| 20 \| I -9.34% \| -5.27 \| -67.47 \| \| TPCH(30) \| TPCH-Q18 \| parquet / none / none \| 8.66 \| 9.50 \| I -8.86% \| 0.62% \| 0.44% \| 20 \| I -9.75% \| -5.27 \| -55.17 \| \| TPCH(30) \| TPCH-Q3 \| parquet / none / none \| 6.01 \| 6.70 \| I -10.19% \| 1.01% \| 0.90% \| 20 \| I -11.25% \| -5.27 \| -35.76 \| \| TPCH(30) \| TPCH-Q12 \| parquet / none / none \| 2.98 \| 3.39 \| I -12.23% \| 1.48% \| 1.48% \| 20 \| I -13.56% \| -5.27 \| -27.75 \| \| TPCH(30) \| TPCH-Q11 \| parquet / none / none \| 1.69 \| 2.00 \| I -15.55% \| 1.63% \| 1.47% \| 20 \| I -18.09% \| -5.27 \| -34.60 \| \| TPCH(30) \| TPCH-Q4 \| parquet / none / none \| 2.42 \| 2.87 \| I -15.69% \| 1.48% \| 1.26% \| 20 \| I -18.61% \| -5.27 \| -39.50 \| \| TPCH(30) \| TPCH-Q14 \| parquet / none / none \| 4.64 \| 6.27 \| I -26.02% \| 1.35% \| 0.73% \| 20 \| I -35.37% \| -5.27 \| -94.07 \| \| TPCH(30) \| TPCH-Q20 \| parquet / none / none \| 3.19 \| 4.37 \| I -27.01% \| 1.54% \| 0.99% \| 20 \| I -36.85% \| -5.27 \| -80.74 \| \| TPCH(30) \| TPCH-Q5 \| parquet / none / none \| 4.57 \| 6.39 \| I -28.36% \| 1.04% \| 0.75% \| 20 \| I -39.56% \| -5.27 \| -120.02 \| \| TPCH(30) \| TPCH-Q17 \| parquet / none / none \| 3.15 \| 4.71 \| I -33.06% \| 1.59% \| 1.31% \| 20 \| I -49.43% \| -5.27 \| -87.64 \| \| TPCH(30) \| TPCH-Q8 \| parquet / none / none \| 5.25 \| 7.95 \| I -33.95% \| 0.95% \| 0.53% \| 20 \| I -51.11% \| -5.27 \| -185.02 \| +----------+----------+-----------------------+--------+-------------+------------+-----------+----------------+-------+----------------+---------+---------+ Change-Id: Iabeeab5eec869ff2197250ad41c1eb5551704acc Reviewed-on: http://gerrit.cloudera.org:8080/14538 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-01-29 00:58:24 +00:00
stiga-huang	bb5000eec9	IMPALA-9272: Fix PlannerTest.testHdfs depending on year(now()) FE test PlannerTest.testHdfs depends on the result of year(now()) to be 2019, which is wrong after we enter 2020. Replace it with another expression not depending on now(). Change-Id: I7b3df560d69e40d3f2332ff242362bd36bbf6b64 Reviewed-on: http://gerrit.cloudera.org:8080/14965 Reviewed-by: Gabor Kaszab <gaborkaszab@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-01-02 15:49:32 +00:00
Aman Sinha	df5c406145	IMPALA-9162: Do not apply inferred predicate to outer joins When the planner migrates predicates to inline views, it also creates equivalent predicates based on the value transfer graph which is built by transitive relationships among join conditions. These newly inferred predicates are placed typically as 'other predicates' of an inner or outer join. However, for outer joins, this has the effect of adding extra predicates in the WHERE clause which is incorrect since it may filter NULL values. Since the original query did not have null filtering conditions in the WHERE clause, we should not add new ones. In this fix we do the following: during the migration of conjuncts to inline views, analyze the predicate of type A <op> B and if it is an inferred predicate AND either the left or right slots reference the output tuple of an outer join, the inferred predicate is ignored. Note that simple queries with combination of inner and outer joins may not reproduce the problem. Due to the nature of predicate inferencing, some combination of subqueries, inner joins, outer joins is needed. For the query pattern, please see the example in the JIRA. Tests: - Added plan tests with left and right outer joins to inline-view.test - One baseline plan in inline-view.test had to be updated - Manually ran few queries on impala shell to verify result correctness: by checking that NULL values are being produced for outer joins. - Ran regression tests on jenkins Change-Id: Ie9521bd768c4b333069c34d5c1e11b10ea535827 Reviewed-on: http://gerrit.cloudera.org:8080/14813 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-12-10 05:48:53 +00:00
Aman Sinha	0f2aa50989	IMPALA-9146: Add a configurable limit for the size of broadcast input. Impala's DistributedPlanner may sometimes accidentally choose broadcast distribution for inputs that are larger than the destination executor's total memory. This could potentially happen if the cluster membership is not accurately known and the planner's cost computation of the broadcastCost vs partitionCost happens to favor the broadcast distribution. This causes spilling and severely affects performance. Although the DistributedPlanner does a mem_limit check before picking broadcast, the mem_limit is not an accurate reflection since it is assigned during admission control. As a safety here we introduce an explicit configurable limit: broadcast_bytes_limit for the size of the broadcast input and set it to default of 32GB. The default is chosen based on analysis of existing benchmark queries and representative workloads such that in vast majority of the cases the parameter value does not need to be changed. If the estimated input size on the build side is greater than this threshold, the DistributedPlanner will fall back to a partition distribution. Setting this parameter to 0 causes it to be ignored. Testing: - Ran all regression tests on Jenkins successfully - Added a few unit testis in PlannerTest that (a) set the broadcast_bytes_limit to a small value and checks whether the distributed plan does hash partitioning on the build side instead of broadcast, (b) pass a broadcast hint to override the config setting, (c) verify the standard case where broadcast threshold is larger than the build input size. Change-Id: Ibe5639ca38acb72e0194aa80bc6ebb6cafb2acd9 Reviewed-on: http://gerrit.cloudera.org:8080/14690 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-11-26 06:47:04 +00:00
Tim Armstrong	d69507086d	IMPALA-9166: undo inadvertant change testResourceRequirements The limitation of this test output to Hive version 2 was accidentally removed while manually merging planner tests. Change-Id: Ia4791b117ddd42df2db5c01c529dd8ec850c5000 Reviewed-on: http://gerrit.cloudera.org:8080/14737 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-11-19 02:58:55 +00:00

1 2 3 4 5 ...

558 Commits