Files
impala/testdata/workloads/functional-planner/queries/PlannerTest/data-source-tables.test
Thomas Tauber-Marshall 8c2bf9769a IMPALA-2805: Order conjuncts based on selectivity and cost
Added costs to all Exprs, which estimate the relative cost of evaluating
an expression and all of its children. Costs are calculated during
analysis. For now, these costs are intended as a simple way to order
expressions from cheap to expensive, not necessarily to be a precise
reflection of running times.

In general, expressions that deal with variable length types like strings
will have higher cost than those dealing with fixed length types
like numbers and booleans. Additionally, expressions with complicated
subexpressions will have higher cost than simpler expressions.

Also added PlanNode.orderConjunctsByCost, which takes a list of Exprs and
returns a new list sorted according to an estimate of the cheapest order to
evaulate the conjuncts in, based on their cost and selectivity.

The conjuncts are sorted by repeatedly iterating over them and choosing the
conjunct that would result in the least total estimated work were it to be
applied before the remaining conjuncts. Selectivities are exponentially
backed off, and Exprs without selectivity estimates are given a reasonable
default.

Change-Id: I02279a26fbc6308ac5eb819d78345fc010469034
Reviewed-on: http://gerrit.cloudera.org:8080/2598
Reviewed-by: Thomas Tauber-Marshall <tmarshall@cloudera.com>
Tested-by: Internal Jenkins
2016-05-12 14:17:53 -07:00

77 lines
3.2 KiB
Plaintext

# Test implicit and explicit casts. Binary predicates containing an explicit cast
# cannot be offered to the data source.
select * from functional.alltypes_datasource
where tinyint_col < 256 and
float_col != 0 and
cast(int_col as bigint) < 10
---- PLAN
00:SCAN DATA SOURCE [functional.alltypes_datasource]
data source predicates: tinyint_col < 256
predicates: float_col != 0, CAST(int_col AS BIGINT) < 10
====
# The first four predicates are in a form that can be offered to the data source
# and the first and third will be accepted (it accepts every other conjunct).
# The second and fourth will be evaluated by Impala. The negated predicates,
# i.e. NOT <PREDICATE>, are not in a form able to be offered to the data source,
# so they will always be evaluated by Impala.
select * from functional.alltypes_datasource
where 10 > int_col and
5 > double_col and
string_col != "Foo" and
string_col != "Bar" and
not true = bool_col and
not 5.0 = double_col
---- PLAN
00:SCAN DATA SOURCE [functional.alltypes_datasource]
data source predicates: 10 > int_col, string_col != 'Foo'
predicates: 5 > double_col, NOT TRUE = bool_col, NOT 5.0 = double_col, string_col != 'Bar'
====
# The 3rd predicate is not in a form that can be offered to the data source so
# the 4th will be offered and accepted instead.
select * from functional.alltypes_datasource
where int_col < 10 and
double_col > 5 and
string_col in ("Foo", "Bar") and
bool_col != false
---- PLAN
00:SCAN DATA SOURCE [functional.alltypes_datasource]
data source predicates: int_col < 10, bool_col != FALSE
predicates: double_col > 5, string_col IN ('Foo', 'Bar')
====
# Tests that all predicates from the On-clause are applied (IMPALA-805)
# and that slot equivalences are enforced at lowest possible plan node
# for tables produced by a data source.
select 1 from functional.alltypes_datasource a
inner join functional.alltypes_datasource b
# equivalence class
on a.id = b.id and a.id = b.int_col and a.id = b.bigint_col
and a.tinyint_col = b.id and a.smallint_col = b.id
and a.int_col = b.id and a.bigint_col = b.id
# redundant predicates to test minimal spanning tree of equivalent slots
where a.tinyint_col = a.smallint_col and a.int_col = a.bigint_col
---- PLAN
02:HASH JOIN [INNER JOIN]
| hash predicates: a.id = b.id
| runtime filters: RF000 <- b.id
|
|--01:SCAN DATA SOURCE [functional.alltypes_datasource b]
|--predicates: b.id = b.int_col, b.id = b.bigint_col
|
00:SCAN DATA SOURCE [functional.alltypes_datasource a]
predicates: a.id = a.int_col, a.tinyint_col = a.smallint_col, a.int_col = a.bigint_col, a.id = a.tinyint_col
====
# Tests that <=>, IS DISTINCT FROM, and IS NOT DISTINCT FROM all can be offered to the
# data source.
select * from functional.alltypes_datasource
where id <=> 1
and bool_col <=> true
and tinyint_col IS DISTINCT FROM 2
and smallint_col IS DISTINCT FROM 3
and int_col is not distinct from 4
and bigint_col is not distinct from 5
---- PLAN
00:SCAN DATA SOURCE [functional.alltypes_datasource]
data source predicates: id IS NOT DISTINCT FROM 1, tinyint_col IS DISTINCT FROM 2, int_col IS NOT DISTINCT FROM 4
predicates: bool_col IS NOT DISTINCT FROM TRUE, smallint_col IS DISTINCT FROM 3, bigint_col IS NOT DISTINCT FROM 5
====