impala

mirror of https://github.com/apache/impala.git synced 2026-01-06 06:01:03 -05:00

Author	SHA1	Message	Date
Tim Armstrong	c4d284f3cc	IMPALA-5483: Automatically disable codegen for small queries This is similar to the single-node execution optimisation, but applies to slightly larger queries that should run in a distributed manner but won't benefit from codegen. This adds a new query option disable_codegen_rows_threshold that defaults to 50,000. If fewer than this number of rows are processed by a plan node per impalad, the cost of codegen almost certainly outweighs the benefit. Using rows processed as a threshold is justified by a simple model that assumes the cost of codegen and execution per row for the same operation are proportional. E.g. if x is the complexity of the operation, n is the number of rows processed, C is a constant factor giving the cost of codegen and Ec/Ei are constant factor giving the cost of codegen'd and interpreted execution and d, then the cost of the codegen'd operator is C * x + Ec * x * n and the cost of the interpreted operator is Ei * x * n. Rearranging means that interpretation is cheaper if n < C / (Ei - Ec), i.e. that (at least with the simplified model) it makes sense to choose interpretation or codegen based on a constant threshold. The model also implies that it is somewhat safer to choose codegen because the additional cost of codegen is O(1) but the additional cost of interpretation is O(n). I ran some experiments with TPC-H Q1, varying the input table size, to determine what the cut-over point where codegen was beneficial was. The cutover was around 150k rows per node for both text and parquet. At 50k rows per node disabling codegen was very beneficial - around 0.12s versus 0.24s. To be somewhat conservative I set the default threshold to 50k rows. On more complex queries, e.g. TPC-H Q10, the cutover tends to be higher because there are plan nodes that process many fewer than the max rows. Fix a couple of minor issues in the frontend - the numNodes_ calculation could return 0 for Kudu, and the single node optimization didn't handle the case where for a scan node with conjuncts, a limit and missing stats correctly (it considered the estimate still valid.) Testing: Updated e2e tests that set disable_codegen to set disable_codegen_rows_threshold to 0, so that those tests run both with and without codegen still. Added an e2e test to make sure that the optimisation is applied in the backend. Added planner tests for various cases where codegen should and shouldn't be disabled. Perf: Added a targeted perf test for a join+agg over a small input, which benefits from this change. Change-Id: I273bcee58641f5b97de52c0b2caab043c914b32e Reviewed-on: http://gerrit.cloudera.org:8080/7153 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins	2017-06-29 21:14:59 +00:00
Michael Ho	91237051af	IMPALA-4164: Avoid overly aggressive inlining in LLVM IR When generating IR functions during codegen, we used to always tag the functions with the "AlwaysInline" attribute. That potentially leads to excessive inlining, causing very long optimization / compilation time with marginal performance benefit at runtime. One of the reasons for doing it was that the "target-cpu" and "target-features" attributes were missing in the generated IR functions so the LLVM inliner considers them incompatible with the cross-compiled functions. As a result, the inliner will not inline the generated IR functions into cross-compiled functions and vice versa unless the "AlwaysInline" attributes exist. This change fixes the problem above by setting the "target-cpu" and "target-features" attributes of all IR functions to match that of of the host's CPUs so both generated IR functions and cross-compiled functions will have the same values for those attributes. With these attributes set, we now rely on the inliner of LLVM to determine whether a function is worth being inlined. With this change, the codegen time of a query with very long predicate went from 15s to 4s and the overall runtime went from 19s to 8s. Change-Id: I2d87ae8d222b415587e7320cb9072e4a8d6615ce Reviewed-on: http://gerrit.cloudera.org:8080/6941 Reviewed-by: Michael Ho <kwho@cloudera.com> Tested-by: Impala Public Jenkins	2017-06-08 00:16:12 +00:00
Tim Armstrong	10fa472fa6	IMPALA-4302,IMPALA-2379: constant expr arg fixes This patch fixes two issues around handling of constant expr args. The patches are combined because they touch some of the same code and depend on some of the same memory management cleanup. First, it fixes IMPALA-2379, where constant expr args were not visible to UDAFs. The issue is that the input exprs need to be opened before calling the UDAF Init() function. Second, it avoids overhead from repeated evaluation of constant arguments for ScalarFnCall expressions on both the codegen'd and interpreted paths. A common example is an IN predicate with a long list of constant values. The interpreted path was inefficient because it always evaluated all children expressions. Instead in this patch constant args are evaluated once and cached. The memory management of the AnyVal* objects was somewhat nebulous - adjusted it so that they're allocated from ExprContext::mem_pool_, which has the correct lifetime. The codegen'd path was inefficient only with varargs - with fixed arguments the LLVM optimiser is able to infer after inlining that the expressions are constant and remove all evaluation. However, for varargs it stores the vararg values into a heap-allocated buffer. The LLVM optimiser is unable to remove these stores because they have a side-effect that is visible to code outside the function. The codegen'd path is improved by evaluating varargs into an automatic buffer that can be optimised out. We also make a small related change to bake the string constants into the codegen'd code. Testing: Ran exhaustive build. Added regression test for IMPALA-2379 and MemPool test for aligned allocation. Added a test for in predicates with constant strings. Perf: Added a targeted query that demonstrates the improvement. Also manually validated the non-codegend perf. Also ran TPC-H and targeted perf queries locally - didn't see any significant changes. +--------------------+-------------------------------+-----------------------+--------+-------------+------------+-----------+----------------+-------------+-------+ \| Workload \| Query \| File Format \| Avg(s) \| Base Avg(s) \| Delta(Avg) \| StdDev(%) \| Base StdDev(%) \| Num Clients \| Iters \| +--------------------+-------------------------------+-----------------------+--------+-------------+------------+-----------+----------------+-------------+-------+ \| TARGETED-PERF(_20) \| primitive_filter_in_predicate \| parquet / none / none \| 1.19 \| 9.82 \| I -87.85% \| 3.82% \| 0.71% \| 1 \| 10 \| +--------------------+-------------------------------+-----------------------+--------+-------------+------------+-----------+----------------+-------------+-------+ (I) Improvement: TARGETED-PERF(_20) primitive_filter_in_predicate [parquet / none / none] (9.82s -> 1.19s [-87.85%]) +--------------+------------+----------+----------+------------+-----------+----------+----------+------------+--------+--------+-----------+ \| Operator \| % of Query \| Avg \| Base Avg \| Delta(Avg) \| StdDev(%) \| Max \| Base Max \| Delta(Max) \| #Hosts \| #Rows \| Est #Rows \| +--------------+------------+----------+----------+------------+-----------+----------+----------+------------+--------+--------+-----------+ \| 01:AGGREGATE \| 14.39% \| 155.88ms \| 214.61ms \| -27.37% \| 2.68% \| 163.38ms \| 227.53ms \| -28.19% \| 1 \| 1 \| 1 \| \| 00:SCAN HDFS \| 85.60% \| 927.46ms \| 9.43s \| -90.16% \| 4.49% \| 1.01s \| 9.50s \| -89.42% \| 1 \| 13.77K \| 14.05K \| +--------------+------------+----------+----------+------------+-----------+----------+----------+------------+--------+--------+-----------+ Change-Id: I45c3ed8c9d7a61e94a9b9d6c316e8a53d9ff6c24 Reviewed-on: http://gerrit.cloudera.org:8080/4838 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Internal Jenkins	2016-11-08 02:44:51 +00:00
Tim Armstrong	f613dcd02d	Add functional and targeted perf tests for joins with empty builds I wrote these tests for my IMPALA-3987 patch, but other issues block that optimisations. These tests exercise an interesting corner case so I split them out into a separate patch. The functional tests exercise every join mode for nested loop join and hash join with an empty build side. The perf test exercises hash join with an empty build side. Testing: Made sure the tests passed with both partitioned and non-partitioned hash join implementations. Ran the targeted perf query through the single node perf run script to make sure it worked. Change-Id: I0a68cafec32011a47c569b254979601237e7f2a5 Reviewed-on: http://gerrit.cloudera.org:8080/4051 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Internal Jenkins	2016-08-19 06:04:18 +00:00
Thomas Tauber-Marshall	11ea79c525	Renamed conjunct_ordering.test to primitive_conjunct_ordering.test in targeted-perf This is needed because the workload runner required a prefix of query names to run. Change-Id: Ica8db68141ef653b0b01a7cfa7773302717a35a2 Reviewed-on: http://gerrit.cloudera.org:8080/3021 Tested-by: Internal Jenkins Reviewed-by: Thomas Tauber-Marshall <tmarshall@cloudera.com>	2016-05-14 01:30:00 -07:00
Thomas Tauber-Marshall	8c2bf9769a	IMPALA-2805: Order conjuncts based on selectivity and cost Added costs to all Exprs, which estimate the relative cost of evaluating an expression and all of its children. Costs are calculated during analysis. For now, these costs are intended as a simple way to order expressions from cheap to expensive, not necessarily to be a precise reflection of running times. In general, expressions that deal with variable length types like strings will have higher cost than those dealing with fixed length types like numbers and booleans. Additionally, expressions with complicated subexpressions will have higher cost than simpler expressions. Also added PlanNode.orderConjunctsByCost, which takes a list of Exprs and returns a new list sorted according to an estimate of the cheapest order to evaulate the conjuncts in, based on their cost and selectivity. The conjuncts are sorted by repeatedly iterating over them and choosing the conjunct that would result in the least total estimated work were it to be applied before the remaining conjuncts. Selectivities are exponentially backed off, and Exprs without selectivity estimates are given a reasonable default. Change-Id: I02279a26fbc6308ac5eb819d78345fc010469034 Reviewed-on: http://gerrit.cloudera.org:8080/2598 Reviewed-by: Thomas Tauber-Marshall <tmarshall@cloudera.com> Tested-by: Internal Jenkins	2016-05-12 14:17:53 -07:00
Martin Grund	89113544cd	Fix targeted perf queries to deal with run-workload.py limitations. This patch fixes the comment style in the queries to work properly with the limitations of the run-workload.py script. This includes removing quotes and + from comments that otherwise get interpreted. Change-Id: I791e7bd4145717aa0628c56b93582cd207195039 Reviewed-on: http://gerrit.cloudera.org:8080/1689 Reviewed-by: Martin Grund <mgrund@cloudera.com> Tested-by: Internal Jenkins	2016-01-05 00:52:38 +00:00
Mostafa Mokhtar	f79a021cce	Add targeted perf queries for nightly performance runs Tests were tuned to run on a 9x node cluster with 64GB RAM against TPC-H 300GB database. Change-Id: Ib421bcd463d370f795a235b755aeb24a6a70f705 Reviewed-on: http://gerrit.cloudera.org:8080/1394 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2015-12-29 05:04:10 +00:00
Skye Wanderman-Milne	c79cd3aa23	Add targted-perf query that makes local expr allocations Change-Id: Ida40481cb429227058d78c619820de23f5c4a15e Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4772 Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com> Tested-by: jenkins	2014-10-07 15:48:32 -07:00
Lenni Kuff	c3619d9581	Add targeted perf queries for columns materialized from inline-view The planner should see only c1 and c2 are being materialized from the inline-view in these queries. This will provide a significant performance improvement on Parquet format tables. Change-Id: If9a366000531a8383dc20ad6f40456ace2281b7d Reviewed-on: http://gerrit.ent.cloudera.com:8080/1017 Reviewed-by: Marcel Kornacker <marcel@cloudera.com> Tested-by: jenkins	2014-01-08 10:54:17 -08:00
Lenni Kuff	11556a1ad2	Add targeted perf regression test for IMPALA-288	2014-01-08 10:50:13 -08:00
ishaan	15658f384b	Include targeted performance tests in experiments and add a new query	2014-01-08 10:49:02 -08:00
Skye Wanderman-Milne	461a48df2b	Refactor testing framework to generate Avro tables.	2014-01-08 10:48:45 -08:00
Lenni Kuff	328ceed4e7	Add support for generating lzo compressed text files and running tests against lzo	2014-01-08 10:48:38 -08:00
Lenni Kuff	90d7e085fa	Update tests to use num_nodes=0, use external impala cluster, add sanity check run mode	2014-01-08 10:48:38 -08:00
Skye Wanderman-Milne	6c08716439	IMPALA-92: Significant performance difference between LIKE = 'x' AND = 'x'	2014-01-08 10:48:21 -08:00
ishaan	09d6d931f4	Change the way data is loaded	2014-01-08 10:48:09 -08:00
Lenni Kuff	3fb375cdc4	Add initial set of queries for targeted perf workload Includes a query that runs a simple "limit 0" as well as queries that perform aggregation on columns with different numbers of GROUP BY groups.	2014-01-08 10:47:23 -08:00

18 Commits